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SEX ROLE AND COMMUNITY VARIABILITY 
IN TEST PERFORMANCES' 


CARSON McGUIRE’ 


University 


In the Human Talent Project, which 
has dealt so far with boys and girls in 
their junior high school years, one ma- 
jor problem has been the identification 
of suitable measures which combine 
effectively to explain and forecast tal- 
ented behavior among adolescents 
(Hindsman & Duke, 1960). The pro- 
posed psychological model for research 
in human talent (McGuire, 1960) pos- 
tulates three categories of variables. 
Each kind of behavior to be explained 
or predicted is, in large part, a func- 
tion of (a) potentialities of an indi- 
vidual pertinent to that behavior, (b) 
expectations regarding supportive or 
nonsupportive responses of self and 
others, and (c) pressures imposed upon 
the person by parents, age-mates, and 
teachers. The pressures form a part of 
what Goethals (1958) terms the “con- 
text” in his framework for educational 
research. Not only the interpersonal 
but also multipersonal situations or 
institutional settings have to be taken 
into account. 


*The research reported herein was sup- 
ported through the Cooperative Research 
Program of the Office of Education, United 
States Department of Health, Education, 
and Welfare. 

* The author is indebted to Earl Jennings 
and F. J. King (now at Florida State Uni- 
versity), research associates of the Human 
Talent Project, and to Kathleen Silva, sta- 
tistical clerk, Laboratory of Human Behav- 
ior, for valuable assistance in carrying out 
the numerous computations for this re- 
search. 
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of Tezas 


This paper reports a series of tests 
of the proposition that sex role and 
school location have a moderating in- 
fluence upon the performances of jun- 
ior high school students on cognitive 
and noncognitive instruments. Each 
of the test variables has been selected 
as a possible measure of the poten- 
tialities, expectations, or pressures. If 
the results support the proposition, the 
model should be modified by two 
kinds of moderator variables. One is 
variability of certain test variables in 
terms of sex-role identification and 
the sex-typing of socialization pres- 
sures upon boys and girls (More, 
1953). The other represents differences 
in community context and patterns of 
educational experiences from one loca- 
tion to another with accompanying 
variations in what is expected and 
what may be learned (Ferguson, 
1954). Indirectly, the series of studies 
permits an evaluation of the useful- 
ness of the context variables suggested 
by Goethals (1958). More directly, 
the results provide some necessary 
data about relations among variables 
which may be used to define the di- 
mensions of and to predict talented 
behavior. 


METHOD 


The research team was quite aware of 
the literature on a wide range of sex and 
other human differences, much of it ably 
summarized by Tyler (1956) and by Ana- 
stasi (1958). No studies of individual differ- 
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ences, however, provided data upon the 
complex interplay among variables en- 
countered when subjects of both sexes were 
drawn from schools in several communities 
and when they represented different family 
backgrounds and levels of mental function 
Therefore three different tests of the prop- 
osition were designed, each using a multi- 
variate approach made possible by pro- 
graming analyses for a modern high speed 
computer. 


Analyses of Variance 


The first design made provisions for two 
other sources of variation in test perform- 
ance in addition to sex role and school con- 
text. One of these components was family 
status, representing variations in culturally- 
typed experiences and the socialization pres- 
sures of different family backgrounds in 
each community (Auld, 1952; McGuire, 
1953). The other was level of mental func- 
tion, to test the interaction of cognitive 
functioning with the other three components 
of variance. Sex role (S), family status (F), 
mental function (M), and school location 
(L) were combined in a 2 X 3 X 3 X 4 
factorial design with two replications in each 
cell. The replications provided an error 
term with 72 degrees of freedom for testing 
main effects and interactions among the 
postulated sources of variation in perform- 
ances upon 22 cognitive and 22 noncognitive 
test variables. 

Subjects. The subjects were 144 junior 
high school students, 72 boys and 72 girls, 
drawn by random procedures to fit the re- 
search design from a total population of 
1,417 who had responded to each of the 
tests in the seventh grade. Each of the 
four communities, designated A, B, C, and 
D, participating in the Human Talent Proj- 
ect was represented by 36 subjects. Of these 
boys and girls, equal numbers were selected 
from high (HFS), middle (MFS), and low 
(LFS) status homes as determined by an 
index of social status for each family 
(McGuire & White, 1955). Each subsample 
was subdivided for level of mental function, 
the measure of intelligence being the Cal- 
ifornia Test of Mental Maturity (CTMM), 
Junior High Level, Form S8, 1957. Based 
upon standard deviations from the mean 
for the total population of Anglo-, Negro-, 
and Latin-American students, the subdivi- 
sions were IQ 113 and above (HMF), 84 to 
112 (AMF), and 83 or less (LMF). 

Cognitive variables. Among the 22 cogni- 
tive test variables, 11 were measures of 
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achievement during the seventh grade. 
Reading, language, and arithmetic achieve- 
ment were assessed by the California 
Achievement Tests (CAT), Junior High 
Level, Form W, 1957. CAT Reading had 
two subtests, Vocabulary and Comprehen- 
sion; CAT Language had subscores for 
Mechanics of English and for Spelling; and 
CAT Arithmetic combined Fundamentals 
and Reasoning. The remaining achievement 
instruments were STEP Social Studies and 
STEP Science from the Sequential Tests of 
Educational Progress, Cooperative Test 
Service, Form 3A, 1957. 

The other 11 cognitive test variables 
measured the cognitive, perceptual, and 
psychomotor potentialities of each boy and 
girl in the seventh grade year. STEP Listen- 
ing, which required comprehension of pas- 
sages read aloud, measured efficiency in the 
apprehension of verbal stimuli. Clerical Ap- 
titude and Mechanical Reasoning were 
chosen from the well-known battery of 
Differential Aptitude Tests (DAT), Form 
A, 1947. Vocabulary Completion and Gestalt 
Transformation, the latter to estimate abil- 
ity to shift the function of a part of an 
object and use it in a new way, were taken 
from a battery of factor tests used by Guil- 
ford, Wilson, Christenson, and Lewis (1951). 
Mutilated Words, Gestalt Completion, 
Short Words, and Copying were selected 
from a kit of reference tests, supplied by 
the Educational Testing Service, to evaluate 
various aspects of perceptual closure. Dot- 
ting and DRT, to measure psychomotor 
speed and discrimination reaction time, were 
used by special permission of the Air Re- 
search and Development Command, Lack- 
land Air Force Base. 

Noncognitive variables. The 22 noncogni- 
tive instruments were paper-and-pencil tests 
designed to assess motivations and other 
elements of personality as well as attitudes 
expressed in self reports. Scores for 11 test 
variables were obtained from an administra- 
tion of the IPAT Junior Personality Quiz 
(JPQ), 1952. Of the JPQ scales, only the 
one for Intelligence has not been included 
in the analyses. Two parts of the Texas 
Cooperative Youth Study, developed by 
Moore and Holtzman (1958) for a state- 
wide survey, provided scales to assess fam- 
ily tensions, negative social orientation, 
authoritarian discipline, personal maladjust- 
ment, criticism of education, criticism of 
youth, social inadequacy, and self-inade- 
quacy. The children’s form of the Manifest 
Anxiety Scale (CMAS), as used by Cas- 
tenada, McCandless, and Palermo (1956), 
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was modified for administration to adoles- 
cents (odd-even reliability 90 for 41 items). 
Finally, the Brown-Holtzman Survey of 
Study Habits and Attitudes (SSHA) was 
adapted for use in secondary schools (odd- 
even reliability 95 on 55 items keyed for 
boys and girls). Thus modified, the SSHA 
yielded two test variables, Scholastic Mo- 
tivation and Teacher Valuation. 

Analyses of data. Distributions of scores 
on the 44 test variables were in the form of 
stanine values 1 to 9 with 5 as the mean 
value for the total population of 1,417 stu- 
dents. The transformation of raw scores 
into stanine values not only permitted 
maximum use of the IBM 650 facilities at 
the University Computing Center but also 
assured additivity and homogeneity of var- 
iance. Means for the sample population 
hovered about a stanine value of 5 with 
somewhat larger deviations than the total 
population since the subjects represented 
extremes of family status and mental func- 
tion as well as boys and girls in the middle 
range from each community 

The 44 analyses of variance were carried 
out on the IBM 650 computer and F tests 
were used to identify significant main ef- 
fects and interactions. Comparisons were 
made with the deviation term (SFMLP), 
which had 72 degrees of freedom, unless a 
lower-order interaction was significant at 
the 01 or 05 level of confidence. Then in- 
traclass correlation coefficients were com- 
puted for each significant mean square in 
the resulting tables, employing the formula 


MS, — MS, 


ri 


where: 


MS, = mean square for “among groups” 
MS. mean square for “within groups” 
n = number of subjects in each group 


and using the deviation term (SFMLP) as 
MS. for each computation. The intraclass 
correlation coefficients were regarded as 
measures of the degree of “resemblance” 
among subjects in the same sex role (8), 
family status (F), level of mental function 
(M), school location (L), or sharing some 
combination of these attributes 


Factor Analyses 


The second approach to testing the 
proposition that sex role and school loca- 
tion influence performances on cognitive 
and noncognitive tests involved a series of 


factor analytic studies to map out relation- 
ships among variables. By this time, the 
remaining factor tests from the Guilford 
battery had been scored for the total pop- 
ulation of 1,417 males and females in the 
four school locations. Thus stanine values 
for Unusual Uses, Consequences, Common 
Situations, and Seeing Problems were added 
to the master deck along with subject 
grades assigned by teachers for the year 
and grade point averages (GPA) 

In each analysis, correlations among the 
variables provided a matrix for the extrac- 
tion of centroid factors which were rotated 
to an orthogonal normal varimax solution 
(Kaiser, 1958) by appropriate program- 
ing. The varimax criterion for analytic 
rotation was elected since it had the great- 
est likelihood of portraying factors invari- 
ant under changing samples of tests and 
populations. For this study, analyses were 
carried out for each sex role (m, f) and for 
each community (A, B, C, D) in addition 
to the one for the total population 


Multiple Regression Analyses 


The third step involved multiple re- 
gression studies with selected criteria of 
academically talented behavior as depen- 
dent variables. These criterion measures 
were GPA Teacher Evaluation, CAT Read- 
ing, CAT Language, CAT Arithmetic, 
STEP Social Studies, and STEP Science 
Separate analyses were carried out for each 
sex (m, f) in each of the four communities 
(A, B, C, D). The cognitive and noncogni- 
tive variables which showed promise as in- 
dicators of talented behavior were retained 
as dimensional or independent variables. In 
addition, CTMM Mental Function and ISS 
Family Status were added to represent dif- 
ferences in intellectual functioning and in 
family background, respectively, which were 
shown to be involved in the interactions 
found in the initial variance analysis 
Rhymes, another of the Guilford et al 
(1951) factor tests to measure word fluency 
or verbal facility, also was included. By 
this time, the meanings of 46 kinds of de- 
scriptions age-mates made of one another 
in response to nomination items had been 
sorted out by factor analytic and factor 
matching techniques (Hindsman, 1960). 
Thus five sociometric factor variables com- 
mon to boys and girls were included in the 
regression studies: namely, Peer Accept- 
ance, Absence of Negative Model Value, 
Social Effectiveness, Nondeviant vs. Devi- 
ant Behavior, and Quiet Dependency. 
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An iterative technique (Greenberger & 
Ward, 1956) was employed for the multiple 
regression analyses. The technique, which is 
a modification of the Kelly-Salisbury 
method, was adopted since the independent 
variables were so numerous. Using the IBM 
650 computer, iteration was carried out to 
that point in the program where the sums 
of squares of regressed values, R*, were not 
raised more than a specified criterion value 
of 0005 in the analysis. The stop criterion 
provided a solution which avoided overfit- 
ting the regression line. This solution was 
regarded as one which would have minimum 
shrinkage upon application of the regression 
weights to subsequent samples. Variables 
with regression weights of zero at the con- 
clusion of the interations were regarded as 
linear combinations of those with weights 
Thus it was not necessary to specify in ad- 
vance the subsets of independent variables 
which were linear combinations of the others 
This was regarded as an important justifica- 
tion for the method employed 


RESULTS 


The outcomes of the three studies 
are summarized in a series of tables 
which clearly show variability in test 


performances between boys and girls 
and from one community to another. 
Many of the findings in the variance 


analyses confirm what is already 
known. What is new largely involves 
the significant interactions. The factor 
analytic and multiple regression stud- 
ies demonstrate the consequences of 
the sex-typing of socialization pres- 
sures and the influences of school lo- 
cation, or community context, upon 
ways in which variables combine to 
map out different kinds of behavior 
and to explain certain kinds of talent 
valued in junior high schools. 


Variance in Test Performances 


Table 1 shows results of the anal- 
yses of variance for performances of 
144 subjects on the 22 cognitive test 
variables. In a similar manner, Table 
2 gathers together the results for the 
22 noncognitive measures. Instead of 
the usual mean squares, the entries 


are interaclass correlation coefficients 
which measure the resemblance or 
average degree of similarity among 
persons classified alike. For example, 
in Table 1, the coefficient .49 in the 
column for mental function (M) and 
row for CAT Reading Achievement 
has replaced a mean square significant 
at the .01 point in the original analysis 
of variance. Reference to tables of 
means, not shown in this report to 
conserve space, reveals mean stanine 
values of 7.9 for HMF, 5.2 for AMF, 
and 4.3 for LMF subsamples. The co- 
efficient, r; = .49, represents the de- 
gree to which 48 subjects in the same 
category of mental function are alike 
in performances on the standard test. 
Each entry in the two tables may be 
interpreted in a similar manner. 

Cognitive tests. As one would ex- 
pect, Table 1 shows that, without ex- 
ception, mental function is a source of 
variation in performances on achieve- 
ment, cognitive, and psychomotor 
tests. DAT Clerical Aptitude (speed 
and accuracy), Gestalt Completion 
(visual figural recognition), and Dot- 
ting (psychomotor speed), however, 
appear only in interaction with other 
components of variance. In general, 
when the coefficients of resemblance 
are low, as in the case of Mechanical 
Reasoning and Discrimination Reac- 
tion Time, the subjects in AMF and 
LMF subsamples are more like one 
another than they are similar to HMF 
students who exceed their perform- 
ances. 

Independently of mental function, 
family status clearly is a source of 
variation in some cognitive perform- 
ances. The HFS boys and girls have 
the advantages of upper, upper-mid- 
dle, and middle-middle class family 
backgrounds. They resemble one an- 
other more than they do either the 
AFS or LFS subjects in various as- 
pects of reading, in mechanics of Eng- 
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TABLE 1 
SIGNIFICANT INTRACLASS CORRELATION COEFFICIENTS FROM ANALYSES OF VARIANCE IN 
CoGniTIveE PERFORMANCES oF 144 Junior Hiecu StupENTs 


Cognitive Variable 


CAT Reading Achievement 
Vocabulary 
Comprehension 

CAT Language Achievement 
Mechanics of English 
Spelling 

CAT Arithmetic Achievement 
Fundamentals 
Reasoning 

STEP Social Studies 

Science 
Listening 

DAT Clerical Aptitude 
Mechanical Reasoning 

Vocabulary Completion 

Gestalt Transformation 

Mutilated Words 

Gestalt Completion 

Short Words 

Copying 

Dotting 15** 

DRT Reaction Time 06* 


16** 


Subjects within group, n 72 
l 


af 


Note.— Rounded and decimal points omitted 

* Sex role (8S), family status (F), CTMM mentel function (M), school location 
group, ML = 12, FML = 4, SML = 6, SFML = 2 

*p< 0 

"p< 01 


lish, in solving science problems, and 
in perceiving and copying figures ac- 
curately. 

Sex role apparently has a mod- 
erating influence upon certain kinds 
of cognitive test performances in the 
first year of junior high school. Girls 
resemble one another and excel boys 
in various aspects of language achieve- 
ment, in the speed and accuracy with 
which they cope with clerical tasks, 
in identifying short four-letter words 
embedded in rows of letters (speed of 
perceptual closure), and in psycho- 
motor speed measured by a dotting 
test. In addition, the girls appear to 
have somewhat greater verbal com- 


Source of Variation* 


F M L 


Interaction 


0s** 49** 
05* 44*°* 
08** 38** 
41** 
45** 
29** 
40** 


29°* 


o9** 


39** 
ag** 
19** 


31** 


11** 
49** 
12°* 


19°* 


15** 
14** 


07* 





48 48 


2 2 


); number of subjects in each 


prehension, as measured by Vocabu- 
lary Completion, and Discrimination 
Reaction Time than the boys in the 
seventh grade. On the other hand, 
boys out perform girls in reasoning 
through pictorially presented mechan- 
ical situations. 

Differences in test performances 
traceable to varying patterns of edu- 
cational experiences from one commu- 
nity to another are most apparent 
in the case of achievement in arithme- 
tic. In terms of the sample populations 
drawn, subjects from the two Gulf 
Coast communities, C and D, excel 
those of the North Central cities, A 
and B. The variability between pairs 
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of locations appears to be a function 
of mastery of arithmetic fundamen- 
tals, not ability to reason numerically. 
In language achievement, subjects in 
Community C have the highest mean 
scores and those in A the lowest, B 
and D being similar. The differences 
may be traced to the relative mastery 
of the mechanics of English language 
by students in the four locations. 
Moreover, the small but significant 
variations from place to place in speed 
of perceptual closure, as measured by 
Short Words, parallel the perform- 
ances on language tests. On the other 
hand, the communities rank C, A, D, 
B on the two vocabulary tests vari- 
ables. 

Each of the significant interactions 
in Table 1 involves school location as 
one of the components of variance, 
and four of the nine reflect sex-typed 
influences. In two instances, mastery 
of the fundamentals of arithmetic and 
clerical speed and accuracy, pairs of 


boys and girls with similar intelligence 
and family backgrounds in the same 


community (SFML) resemble each 
other significantly. Performances on 
the three STEP tests, Social Studies, 
Science, and Listening (cognitive ap- 
prehension), as well as Gestalt Trans- 
formation (conceptual redefinition), 
vary from community to community 
according to family background and 
quality of mental function (FML) of 
the junior high students. Aspects of 
perceptual closure, measured by Ges- 
talt Completion and the Copying test, 
also vary from location to location, 
but this time in terms of sex-typed 
learning experiences and intellectual 
functioning (SML). Finally, psycho- 
motor speed, measured by the Dotting 
test, has a different relation to quality 
of mental function from one place to 
another (ML) independently of the 
tendency of girls to excel boys in the 
four communities. 
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Noncognitive tests. The analyses of 
noncognitive performances summa- 
rized in Table 2 refer only to self-re- 
port or paper-and-pencil tests. Re- 
sponses of seventh grade students to 
noncognitive instruments apparently 
are less frequently influenced by var- 
iations in intellectual functioning than 
performances on cognitive tests. The 
48 subjects in the average or AMF 
subsample resemble one another in 
having more nervous tension (JPQ 
2) and reported anxiety symptoms 
(CMAS) than those of either high or 
low intelligence. Thus the relation of 
these two variables to mental function 
(CTMM) is curvilinear rather than 
linear. The relation of a preference for 
authoritarian discipline (CYS) to in- 
telligence also is somewhat curvilinear, 
mean stanine values being 4.6 for 
HMF, 5.4 for AMF, and 5.2 for the 
LMF subsamples. 

Only in two instances does family 
status have an independent effect upon 
distributions of scores obtained from 
the self-reports. The subjects from 
MFS homes, lower-middle and mobile 
upper-lower class, resemble one an- 
other in having less independent dom- 
inance (JPQ 9) than the 48 subjects 
in either the HFS or LFS subsamples. 
As expected, the lower class boys and 
girls in the LFS subsample tend to ex- 
press a negative orientation to society 
(CYS), whereas subjects in the other 
two categories are alike in being pos- 
itive. 

A number of sex-typed differences 
may be noted in the noncognitive test 
behavior. Girls in the sample popula- 
tiqn represent themselves as being 
emotionally sensitive, surgent or talk- 
ative and excitable, high in socialized 
morale or acceptance of school and 
cultural standards, low in independent 
dominance, tolerant and slow to anger, 
and valuing their teachers positively. 
On the other hand, boys resemble one 
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TABLE 2 
SIGNIFICANT INTRACLASS CORRELATION COEFFICIENTS FROM ANALYSES OF VARIANCE 
IN NONCOGNITIVE PERFORMANCES OF 144 JuNIOR HiGH StTupDENTS 


Noncognitive Variable 


JPQ 1 Sensitivity/Toughness 
2 Tension/Relaxation 
Emotional /Stable 
Control /Casualness 
Impatient Dominance 
Sociable/Withdrawn 
Adventurous 
Socialized Morale 
Independent Dominance 
10 Energetic Conformity 
11 Surgency/Desurgency 
CYS Family Tension 
Negative Social Orientation 
Authoritarian Discipline 
Personal Maladjustment 
Criticism of Education 
Criticism of Youth 
Social Inadequacy 


> Cr im © OD 


© Oo =) 


Self Inadequacy 


CMAS Anxiety 
SSHA Scholastic Motivation 
Teacher Valuation 


Subjects within group, n 
df 


Note 
* Sex role (8S), ISS family status (I 


16, FL = 12, SFM = 8, 


Rounded and decimal points omitted 
, CTMM mental f 


group, SL = FMI 


*p< .05 
**» 01 


another more than girls in being tough- 
minded, more serious or desurgent, re- 
acting negatively to learning tasks 
and authority, independent, 
impatient, and not so appreciative of 
their teachers. Sex role, family status, 
and mental function (SFM) combine 
to influence responses to the CYS 
scales for criticism of education and 
feelings of self-inadequacy. In terms 
of the self-reports, the eight lower 
class girls of high intelligence, two 
from each community, are least criti- 
cal and feel least inadequate. Most 
critical of the school and its expecta- 


school 


4, SFML = 


Source of Variation* 
Interaction 


M 


SL = 


10* 
18* 


FL = 


36 


o 


inction (M), school location (L); number of subjects in each 


tions are boys of low intelligence from 
lower-middle and mobile upper-lower 
class families. Feelings of self-inade- 
quacy are highest among lower class 
boys who also are low in mental func- 
tion. 

School location is an independent 
source of variation in only one set of 
responses to the noncognitive instru- 
ments: namely, independent domi- 
nance which is highest among the 36 
subjects in Community C and lowest 
in D. On the other hand, 9 of the 11 
significant interactions in Table 2 in- 
volve variability from one community 





68 CARSON 


to another. The most interesting ones 
are representations of social inade- 
quacy (SL, FML), self-inadequacy 
(FML), and family tension (SFML), 
all departing from zero at the .01 level 
of confidence. For example, the 18 
boys in Community B express the 
greatest social inadequacy (SL). The 
least social and self-inadequacy on the 
CYS scales is represented by the four 
lower class subjects of high intellec- 
tual calibre in Community A (FML). 
Pairs of boys or girls with similar 
level of mental function and fam- 
ily status in the same community 
(SFML) have a remarkable resem- 
blance in CYS scores for family ten- 
sion (r; = .62). 

None of the cognitive test variables 
and only six of the noncognitive in- 
struments remain uninfluenced by one 
or more of the components of variance 
included in the research design. They 
are JPQ 3, 4, 6, 7, and 10, together 
with CYS values for criticism of 


youth. Of these, only JPQ 3 (emotion- 
ality vs. stability) and 6 (sociable vs. 
withdrawn) later proved to be promis- 


ing indicators of talented behavior 
(McGuire, Hindsman, Jennings, & 
King, 1961). As a main effect or in an 
interaction, there are sex-typed dif- 
ferences in performances on 12 of the 
22 cognitive tests and on 13 of the 
22 noncognitive measures. Similarly, 
there are either independent or com- 
plex variations in responses from one 
community to another for 15 of the 
cognitive and 9 of the noncognitive 
variables. 


Factor Analytic Studies 


Table 3 presents a sample of perti- 
nent loadings for cognitive and non- 
cognitive variables from the factor 
analytic studies employed to sort out 
abilities and other attributes of the 
seventh grade population and a num- 
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ber of subpopulations.* The table is 
arranged so that magnitudes of load- 
ings for the two sex roles (m, f) and 
among the four communities (A, B, 
C, D) may be compared for Factor B 
(academic grades) and Factor D (so- 
cially oriented achievement motiva- 
tion). The number of subjects for 
each analysis is entered at the base 
of each column in the table. Following 
convention, loadings of .30 or greater 
are entered for the variables defining 
each factor. The letter a denotes the 
highest loading of a variable for the 
population studied. 

Sex roles. As shown in Table A 
(deposited with ADI), the paper-and- 
pencil test responses and teacher eval- 
uations of the 1,417 junior high 
school students, 772 males and 645 
females, can be mapped out into seven 
factors. Factor matching of loading 
magnitudes (Cattell, 1957, pp. 818- 
827) indicates a reasonably close 
agreement (correlations of .70 or 
greater over all variables) in the 
case of five of the seven factors for 
boys and girls. The factors common 
to both sexes have been provision- 
ally named Scholastic Achievement 
(largely defined by standard tests of 
achievement), Academic Grades (re- 
flecting the teacher evaluations of per- 
formances in the seventh grade), Di- 
vergent Thinking (three of the four 
proposed tests of creativity), Ineffec- 
tively Functioning Personality (self- 
reports of maladjustment, inadequacy, 
tension, and anxiety), and Social 


*A four-page table (Table A) giving 
comparisons of factor loadings from 50- 
variable matrices for total population, sex 
role, and school location has been deposited 
with the American Documentation Institute. 
Order Document No. 6610 from ADI 
Auxiliary Publications Project, Photodu- 
plication Service, Library of Congress; 
Washington 25, D.C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. 
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TABLE 3 
Factor Loapinas or COGNITIVE AND NONCONGITIVE VARIABLES FOR GRADE 
VII Popvutation, By Sex Routes, anp COMMUNITIES 


Grade 
Variable —_ 


VII 


Sex Roles Communities 


m f 


Factor B: Academic Grades 


82" 
71* 
70* 
65* 


GPA Teacher Evaluation 
Teacher Social Studies 
Teacher English 
Teacher Mathematics 
CAT Arithmetic 

CAT Language 

STEP Social Studies 
CAT Reading 

STEP Science 

DAT Reaction Time 
JPQ 3 Emotional/Stable 


Factor D: Socially Oriented Achievement 


71* 
—57* 
— 53* 

47* 

42 
—38 


JPQ 8 Socialized Morale 
JPQ 9 Independent Dominance 
JPQ 3 Emotional/Stable 

SSHA Scholastic Motivation 
JPQ 6 Sociable/Withdrawn 
CYS Critic of Education 
CMAS Anxiety —32 
JPQ 1 Sensitive/Tough 30* 
JPQ 11 Surgent/Desurgent — 
CYS Negative Social Orientation -- 
GPA Teacher Evaluation - 
Teacher Social Studies 

CYS Social Inadequacy 

CYS Maladjustment 

CYS Family Tension 


Population, N 1,417 


Note.—Decimal points omitted. 
® Highest loading for a variable in a factor structure 


Alienation (authoritarian and anti- 
social attitudes). 

The two factors with least agree- 
ment between the sex roles turn out to 
be Socially Oriented Achievement 
Motivation (acceptance of school and 
cultural standards, conformity, stabil- 
ity, and scholastic motivation) and 
Perceptual Closure. Factor loadings 
for the former, Factor D, are shown in 


8 
70* 
69* 
69" 


81* 
70* 
69* 
63* 
30 


—3l1 


Motivation 


47 
—44* 
— 60* 

39° 

38" 
—32 


— 63* 


75° 


— §2* 


72* 75* 
—52* | —52* 
—48* | —52* 

48* 49" 

40° 39* 40* 

— 40 —35 

— —39 — 

— — 50* 
— 36" —36 — _ 
— —_ —_ —44 
— 31 


—35* 
38* 


Table 3. Girls who have a positive 
attitude toward school and academic 
attainments are less anxious than boys 
and appear to be able to be critical of 
educational practices whereas their 
male counterparts are not. Elements 
common to both sexes on the percep- 
tual factor involve ability to cognize 
symbolic units (Mutilated Words, 
Short Words, DAT Clerical). For girls, 
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this ability is linked with achievement 
on a language test; for boys, symbolic 
closure is tied in with visual figure 
recognition and psychomotor speed. 

Community variability. With one 
exception, Social Alienation, the fac- 
tors mapped out for the total popula- 
tion fail to match the four 
communities. The factor structure for 
Commun 'ty C, which had 488 boys 
and girls in departmentalized class- 
rooms of two junior high schools, is 
most like that of the total population. 
In the other school locations, Percep- 
tual Closure is absorbed either by 
Divergent Thinking (A and D) or by 
Scholastic Achievement (B). Com- 
munity A, which had 338 students in 
modified self-contained classrooms of 
one large junior high, otherwise shows 
a pattern which is reasonably similar 
to C and the total population. In Com- 
D, where 264 pupils were 
self-contained classrooms 
well as in a 


across 


munity 
housed in 


of county schools as 


smaller departmentalized junior high 
school, Academic Grades appear to be 
more closely associated with perform- 
ances on standard tests of achievement 


than in Locations A and C. 

The factor analysis for Community 
B, where seventh grade students were 
housed in self-contained classrooms at 
five different elementary schools pend- 
ing the completion of another junior 
high building, yields a sorting of abili- 
ties and other attributes quite different 
from Locations A, C, and D. Only 
four factors are necessary to map the 
paper-and-pencil test responses and 
teacher evaluations of the 319 boys 
and girls. High loadings for the stand- 
ard tests of achievement and for 
teacher evaluations combine with two 
measures of divergent thinking (See- 
ing Problems, Consequences) and in- 
dicators of perceptual closure to form 
a composite factor which might be 
called Academic Attainment. Thus for 
Factor B (Academic Grades), there 


are no loadings shown for Community 
B in Table 3. Factor D (Socially 
Oriented Achievement Motivation) in 
Table 3 for Community B becomes 
what might be termed “effectively 
vs. ineffectively functioning personal- 
ity,” a polar arrangement of JPQ, 
CYS, and similar noncognitive scales. 
Only Social Alienation remains simi- 
lar to the other communities. 


Regression Studies 


Table 4, which indicates the regres- 
sions of CAT Language Achievement 
in Grade 7 upon certain cognitive and 
noncognitive variables by sex role in 
each of the four communities, serves to 
illustrate the results of the regression 
studies. Only the beta weights for 
variables which illustrate between sex 
and cross community variability most 
effectively are shown. Similar vari- 
ability of beta weights for variables 
between the sexes from community to 
community occurs when the criterion 
are CAT Reading, CAT 
Arithmetic, STEP Social Studies, 
STEP Science, and GPA Teacher 
Evaluation.* When GPA _ Teacher 
Evaluation is the dependent variable, 
the factor variables and 
some of the noncognitive indicators 
of talented behavior have higher load- 
ings than they do for standard tests 
of achievement used as criteria. Al- 
though the loading magnitudes for the 


measures 


sociometric 


set of variables vary with each crite- 


(Tables B to G) 
CAT Reading, CAT 
Arithmetic, STEP Social 
Studies, STEP and GPA Teacher 
Evaluation by community and sex role has 
deposited with the American Docu- 
mentation Institute. Order Document No 
6610 from ADI Auxiliary Publications 
Project, Photoduplication Service, Library 
of Congress; Washington 25, D.C., remitting 
$125 for mucrofilm or $1.25 
for photocopies. Make checks payable to 
Chief, Photoduplication Library of 
Congress 


‘A set of six tables 


showing regressions of 
Language, CAT 


Science, 


been 


in advance 


Service, 
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TABLE 4 
ReeGressions or CAT LaneuaGe ACHIEVEMENT IN GrapgE VII upon CoGNITIVE 


AND NONCOGNITIVE VARIABLES 


Variable* 


Multiple correlation, R 
Sample population, NV 


CTMM Mental Function 
DAT Mechanical 
Gestalt Transformation 
Common Situations 
Short Words 

Dotting 

JPQ 1 Sensitivity 

JPQ 8 Social Morale 
JPQ 11 Surgency 

SSHA Scholastic Motiv’n 
N-2 Model Value 


ISS Family Status 


09 
14 
05 


Note.—Rounded to two places, decimal points omitted 


® Beta weights for all 32 variables are shown in Table ¢ 


rion, the multiple correlation coeffi- 
cients are uniformly high and range 
from .75 to .86 (McGuire et al., 1961). 

The important point to be noted in 
Table 4, and in the complete Tables 
B to G (deposited with ADI), is that 
the subset of independent variables 
which yields the maximum multiple 
correlation with the criterion measure 
is not the same from one community 
to another, nor for males and females 
at each of the school In 
each column of the complete tables, 
the omitted variables (which have 
beta weights of zero) can be regarded 
as linear combinations of those with 
loading magnitudes for the particular 
subpopulation. Thus the regression 
studies supply further evidence that 
the sex-typing of socialization pres- 
sures and the context of educational 
experiences apparently do have quite 
an influence upon ways in which cog- 
nitive and noncognitive variables rep- 


locations 


BY COMMUNITY AND Sex Roe 


Community 


81 

266 
33 20 
08 
06 

—09 
10 


06 
—12 
06 
06 
12 

—03 —O08 
—03 
05 


12 
19 11 
—10 

20 


in lower portion of table 


>, deposited with the American Documentation Institute 


resenting potentialities, expectations, 
and pressures combine to explain vari- 
ous kinds of valued behavior. 


DISCUSSION 


The series of variance, factor ana- 
lytic, and regression studies of sex 
role and community variability in 
test performances, as well as teacher 
evaluations and peer appraisals, con- 
firm some well-known findings and 
raise questions about others. For ex- 
ample, the variance analyses of sev- 
enth grade data fail to support the 
generally held conclusion that females 
excel in numerical computation and 
males are superior in mathematical 
reasoning and in science (Anastasi, 
1958, pp. 452-504; Tyler, 1956, pp. 
247-275). Although the reputed ag- 
gressiveness of boys can be read into 
their greater tough-mindedness, in- 
dependence, and negative reactions to 
learning tasks and school authority, 
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the seventh grade males are more 
serious (desurgent) than their female 
counterparts in this study. The higher 
emotional sensitivity and excitability 
(surgency) of the girls might be in- 
terpreted as symptoms of neuroticism 
and instability, but this is belied by 
their tolerance, socialized morale, and 
restraint. 

Taken together, the series of studies 
clearly support the proposition that 
sex role and school location have a 
moderating influence upon the per- 
formances of junior high school stu- 
dents on cognitive and noncognitive 
instruments. In the variance analyses, 
more than half of the responses ap- 
pear to be biased by whatever is in- 
volved in learning a male or a female 
sex role. The consequences are more 
evident in the regression than the 
factor analytic studies, where five of 
seven factors showed a reasonable 
match between the sexes. Within com- 
munities, however, the differences be- 
tween males and females are quite 
marked where the same criterion 
measure regresses on different sets of 
independent variables with fluctuat- 
ing beta weights. 

Similarly, more than half of the 
distributions for cognitive tests, and 
nearly half for the noncognitive in- 
struments, vary from one school loca- 
tion to another. These variations are 
illustrated most clearly by the differ- 
ences among communities in the mas- 
tery of fundamentals of arithmetic 
and mechanics of English. More im- 
portant, however, are the frequent 
interactions of school location with 
other components of variance. They 
would lead one to infer not only that 
patterns of educational experiences 
differ but also that more subtle in- 
fluences are being exerted by the four 
community contexts. These inferences 
are borne out both by the differences 
in factor structure, which parallel de- 


gree of departmentalization or organi- 
zation in terms of self-contained 
classrooms, and by the regression 
studies. 

One of the unexpected findings is 
the frequency with which life style is 
independent of intelligence in influenc- 
ing behavior on cognitive tests, es- 
pecially standard measures of achieve- 
ment. The most probable explanation 
of the additive relation between social 
class and mental function is that high 
family status is an index of the kinds 
of motivation and socialization pres- 
sures which maximize experiences in- 
volving reading, the effective use of 
English, and concern about scientific 
knowledge. This interpretation paral- 
lels a finding in another study (McBee 
& Duke, 1960) that intelligence and 
SSHA scholastic motivation have in- 
dependent effects upon achievement 
in arithmetic, reading, and science, but 
not language and social studies. 

The foregoing findings posed a very 
difficult problem for the research team. 
They suggested that the sex-typing of 
socialization pressures and the in- 
fluences of school location upon ways 
in which the proposed classes of vari- 
ables (potentialities, expectations, 
pressures) combined were more im- 
portant than first postulated. If so, 
a unique combination of indicators 
to explain and predict talented behav- 
ior would have to be worked out for 
males and females of each community. 
The alternative was to estimate the 
underlying factors in persons (Cattell, 
1957, pp. 287-296; Guilford, 1954, pp. 
524-526) to attain what Guilford 
terms “a much more standard article.” 


SUMMARY 


A series of variance, factor analytic, 
and regression studies were under- 
taken to test the proposition that sex 
role and school location have a mod- 
erating influence upon the perform- 
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ances of junior high school students 
on cognitive and noncognitive instru- 
ments selected as potential indicators 
of talented behavior. Using a factorial 
design which involved 144 boys and 
girls from four communities in their 
seventh grade year, analyses of vari- 
ance were carried out for 22 cognitive 
and 22 noncognitive variables. These 
analyses demonstrated many expected 
and some unanticipated main effects 
of differences in sex role, family status, 
mental function, and school location. 
More important, they showed 20 sig- 
nificant interactions, 18 of which in- 
volved variations in test behavior 
from one community to another. Then 
the consequences of sex role and com- 
munity variability in test perform- 
ances, first identified in the variance 
analyses, were demonstrated in factor 
analytic studies and multiple regres- 
sion analyses. 

The regression analyses revealed 
that various kinds of behavior valued 


in junior high schools could be ex- 


plained with multiple correlations 
ranging from .75 to .86 by varying 
combinations of cognitive and non- 
cognitive variables. But these sets of 
independent variables to be used as 
predictors were different for each com- 
munity and for boys and girls within 
each school location. This develop- 
ment paralleled the finding that factor 
structures varied for males and fe- 
males, and even more so from one 
community to another. 
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SIGNIFICANT IQ CHANGES IN TWENTY-FIVE YEARS: 


A FOLLOW-UP" 


KATHERINE P. BRADWAY 


Stanford University 


The following constitutes a report 
of several statistical analyses of in- 
telligence test and family background 
data obtained in 1931, 1941, and 1956, 
on 110 members of the original pre- 
school standardization sample of the 
1937 Revision of the Stanford-Binet.* 
An initial report of the study has ap- 
peared previously (Bradway, Thomp- 
son, & Cravens, 1958). 

In 1931, 212 children in the 
Francisco Bay Area, aged 2 to 
were given tests which later consti- 
tuted Forms L and M of the 1937 
Stanford-Binet. Careful methods of 


San 
5'A, 


selection were adhered to in selecting 
these subjects, as they were the Cali- 


fornia sample of the nationwide stand- 
ardization of this revision (McNemar, 
1942; Terman & Merrill, 1937). Ten 
vears later, the senior author admin- 
istered Form L of the Stanford-Binet 
to the 138 of these children who still 
remained in the area. 

In 1956, 110 of those tested in 1931 
and 1941 were located and given both 
Form L of the 1937 Stanford-Binet 
and the Weschler Adult Intelligence 
Seale. No interim contact had been 

*This investigation was supported by 
PHS Research Grant M-1273 (C2) from the 
National Institutes of Health, United States 
Public Health Service 

* Formerly at Stanford University 

*The authors are grateful Maud A 
Merrill and the late Lewis M. Terman for 
generously having made their standardiza- 
tion data available for follow-up; 
McNemar, consultant for both follow-ups, 
who has been an invaluable advisor: and to 
Clare Thompson for her several contribu- 
tions to the handling of data to their 
presentation. 


to 


to Quinn 


and 
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made between 1931 and 1941, or be- 
tween 1941 and 1956. 

An important aspect of the 1941 
study involved those 50 subjects whose 
test scores showed statistically signifi- 
cant increase or decrease since the 
original (1931) testing. Extensive in- 
terviews with their parents were con- 
ducted to assess the intellectual stim- 
ulation in the home and to obtain 
data on the intellectual level of the 
parents and grandparents. Significant 
differences between the groups were 
shown an “ancestral index” and 
related measures which estimated the 
intellectual abilities of parents and 
grandparents (Bradway, 1945). In- 
cluded in the 1956 testing were 21 of 
the 26 subjects whose IQs had shown 
the most 1931 and 
1941 (“1941 increase group”), and 
21 of the 24 subjects whose IQs had 
decreased during that period 
(“1941 group’). It these 
42 subjects who figure most promi- 
nently in the analyses to follow, which 
were designed to determine whether 
a change between preschool and junior 
high school age is more likely to be 
followed in adulthood by a change in 
the same direction, the opposite direc- 
tion, or no change at all in the individ- 
ual’s status compared to the rest of 
the group. A further question involves 
the previous finding that a significant 
decrease in IQ was associated with a 
significantly lower intellectual level in 
the two previous generations than was 
a significant increase in IQ. 


on 


increase between 


most 


decrease is 
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The sul 


been adn 


study had 
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the present 
Forms L and 


in 
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Revised Stanford-Binet Scale 25 years pre- 
viously, in 1931, in connec with its 
standardization, and Form L of the 
scale in 1941 (Bradway, 1945). A comparison 
of the 1941 retest group of 138 subjects 
with the total 633 subjects of like age in the 
original United States standardization group 
showed some upward selection. The initial 
mean composite IQ of the retest group was 
109.2, compared with 105.4 for the standard- 
ization group 

In selecting for further study the 54 sub- 
jects whose IQs had changed most, 
corrections were applied to the initial com- 
posite IQs. These allowed for regression of 
the retest (1941) IQ errors of 
measurement of the initial (1931) IQ, and 
equated initial and retest IQs in relation to 
the means of their distributions. The details 
of these adjustments have been reported by 
Bradway (1945) 

In obtaining information in 1941 about 
home environment, not only were the 50 
childern interviewed, but also mothers of 47 
and fathers of 3 of them. The mothers, in 
addition, were given a vocabulary test of 
intelligence and each child was given the 
Woodworth-Cady Questionnaire. Twenty-six 
wide variety of 


won 


same 


special 


due to 


scores were obtained for a 
variables such as intellectual stimulation of 
intelligence of grand- 
fathers’ intelligence as estimated by the 
Minnesota Scale of Occupational Intelli- 
gence (Brussell, 1932), happine ss and social- 
ization of the household, and child’s general 
The most consistent and marked 
reflecting the 


the home, parents, 


justment 
lifferences occurred in scores 
intelligence of parents and grandparents. An 
index which gave to parental 
intelligence and grandfathers’ occupational 
ritical ratio of 3.32 


groups the 26 


weight both 
intelligence produced a « 
(p «< 001) between the 
children in the group coming 
from families with significantly higher index 
values than the 24 children in the “dex 


‘increase’ 


rease 
group 

The loss of 28 subjects between 
1956 testings 
selection with respect to initial (1931) mean 
IQ, which was 111.1 for the 
than the mean for the total 
At the 


were in- 


1941 and 
resulted in further upward 
1956 group, or 
5.7 points highe I 
standardization group of these ages 
time of the 1956 
about 


testing, subjects 
their occupation, 


and health 


about parents’ education and 


terviewed marital 


status, education, information 
grandfathers’ 
and is 
Ances- 


methods 


gathered by mail 
110 subjects 
calculated using 


occupation was 


available for 98 of the 


tral indices were 


similar to those employed by Bradway in the 
revious study 


RESULTS 

Test Scores for 1941 Groups 

An earlier article (Bradway et al., 
1958) reported a correlation (r) of .59 
between preschool Stanford-Binet IQs 
and adult Stanford-Binet IQs obtained 
25 years later; the correlation between 
adolescent and adult IQs obtained in 
1941 and 1956 for the same group was 
85. They also found that the obtained 
1956 Stanford-Binet IQs were higher 
than would be expected on the basis 
of the 1931 and 1941 IQs of this group, 
although the standard deviations of 
the distributions were similar. 

Presumably, intellectual 
does not cease by the age of 16 as had 
been assumed in the 1937 Revision 
(Terman & Merrill, 1937). This group 
had attained a mean mental age of 18.5 
years as adults, instead of 16.8 which 
would have been predicted from their 
adolescent IQs in the absence of men- 
tal growth beyond age 16 years. This 


growth 


finding is in correspondence with the 
reports of several recent investigators 
that intellectual growth continues wel! 
past adolescence (Bayley, 1955; Bay- 
ley & Oden, 1955; Miner, 1957; Owens, 
1953). To equate for this continued 
growth and to permit meaningful com- 
parisons between an individual’s sta- 
tus within the group at the two test- 
ings, the mean difference between the 
1941 and 1956 IQs (11.5 points) was 
subtracted from the obtained 1956 
Stanford-Binet IQs.* The logic under- 
lying this correction is similar to that 
upon which the use of the Deviation 
IQ is based: namely, that the IQ is 
intended as an index of the indi- 
vidual’s relative standing within his 
age group, rather than as a measure 
*A forthcoming article by Bradway and 
Thompson will consider adult intelligence 
ind the problems of appropriate indices 
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of his level of mental development per 
se. There was no reason to suspect that 
these particular subjects had exhib- 
ited unusual mental growth which 
would have elevated the mean IQ of 
this sample above the mean IQ of the 
general population; and their Devia- 
tion IQs on the WAIS tended to 
substantiate this conclusion. Using 
standard scores would have been an 
alternate way of equating the two sets 
of scores, but since the standard devia- 
tions of the two sets of 1Qs were simi- 
lar, equating the means by subtract- 
ing the mean difference from each of 
the 1956 IQs was sufficient to make 
them comparable to the 1941 IQs as 
indices of relative position within the 
group. 

The means and standard deviations 
of the resulting corrected IQs of the 
present sample are included in Table 
1, arranged according to the subgroups 
to which they were assigned in 1941. 
In this table are presented also means 
of the Stanford-Binets administered 
in 1931 and 1941 and of the WAIS 
administered in 1956 to the same 
groups. The respective mean chrono- 


logical ages at the times of these 


three testings were 4.0, 13.6, and 29.5 
years. 

It will be noted from Table 1 that 
for both 1941 change groups, the ob- 
served changes to 1956 are in the re- 
verse direction from the previous 
changes. It would seem that the group 
whose rate of mental growth had 
shown deceleration from preschool to 
adolescence, however, had more nearly 
reached a level of brightness by ado- 
lescence that was largely maintained 
into adulthood. Subtracting their 1941 
IQs from their 1956 (corrected) IQs 
yielded a ¢ value for this decrease 
group of 1.94, with a probability be- 
tween .05 and .10. On the other hand, 
the group whose mental growth had 
shown acceleration between preschool 
and adolescence had tended to lose 
some of their relative gains by the 
time they reached adulthood. The cor- 
responding ¢ value for this increase 
group was —3.02, which is significant 
at beyond the .01 level. Moreover, the 
difference between the 1956 mean IQs 
for the two groups is reliably less 
than the difference between the 1941 
mean IQs for the two groups as indi- 


TABLE 1 
Mean IQ anv STanparp Deviation oF 1941 SuspGroups ror Eacu Test 


Test 


Decrease 
V = 21) 


1931 Forms L, M 
Mean IQ* 
SD 
1941 Form L 
Mean IQ 
SD 
1956 Form L 
Mean IQ (Corr 100 
SD 18 
1956 WAIS 
Mean IQ 101 .! 
SD 14 


1941 Subgroups 


Increase No change 
N = 21 NV = 68) 


108 
28 


113.5 


7.6 


* Average of two IQs minus one point to correct for practise effect 
> IQ minus 11.5 points to equate 1941 and 1956 group means. 
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cated by a ¢ value of 3.59 which is 
significant at beyond the .001 level. 

An index of the stability of the 
scores is provided by determining 
whether, in fact, the groups remain 
significantly different from each other, 
as they were in 1941. Analyses of var- 
iance of the 1956 data yielded an F 
of 12.28 for the Stanford-Binet IQs 
and an F of 7.40 for the WAIS Full 
Scale 1Qs, with probabilities of <.001 
and .001, respectively. The groups do, 
then, remain significantly different 
from each other. 


Further Investigation of the “Ances- 
tral Index” 


In an effort to establish the general- 
ity of the finding that ancestral intel- 
ligence is related to the direction of 
change in mental growth, each subject 
was asked by mail to indicate how 
far each of his parents had gone in 
school, and the occupation of each of 
his grandfathers. This information 
was already available for those sub- 
jects of the 1941 “change” groups, but 
was not known for the 68 subjects who 
had been members of the “no change” 
group. The mean initial (1931) IQ of 
the 98 subjects for whom these data 
became available was 112.3. The mean 
educational level of their parents was 
11 years, which is comparable to the 
mean educational level of California 
residents according to the United 
States Bureau of the Census (1950). 

In the 1941 study, the weighted 
index had been calculated on ranked 
data for the 50 subjects in the 
“change” groups, giving equal weight 
to the following: (a) mothers’ intelli- 

ranked by interviewer 
based on Vocabulary test 
fathers’ intelligence as 
ranked by interviewer (largely based 
on occupation), and (c) average of 
grandfathers’ occupational scores on 
the Minnesota occupational classifica- 


gence as 
(largely 


score), (b) 


tion (Brussell, 1932). Ranks on each 
variable were translated to scores on a 
normalized distribution according to 
a method suggested by Hull (Garrett, 
1932, p. 113). Because the data con- 
sist of ranks within this selected sam- 
ple only, the absolute scores cannot be 
translated directly to other groups. 
A similar method was used in calcu- 
lating the ancestral index for the 1956 
data. In this case, however, the rank 
order of mothers’ intelligence was es- 
timated solely on the basis of educa- 
tion, an index admittedly inferior to 
Vocabulary test score, but the best 
index available to us. A rank order 
correlation of .72 was found between 
education and interviewer estimates 
(largely based on Vocabulary score) 
for the 40 subjects in the deviation 
groups for whom both figures were ob- 
tained in 1941. Fathers’ intelligence 
was estimated by education and Min- 
nesota occupational classification, 
equally weighted. Grandfathers’ oc- 
cupations were classified as before. 
Each rank was translated into a score 
according to the Hull method, and the 
scores were combined to give equal 
weight to mothers’, fathers’, and 
grandfathers’ (combined) scores. 
For the 28 subjects in the “change”’ 
groups who returned the questionnaire, 
a correlation of .70 was obtained be- 
tween ancestral indices based entirely 
on the 1956 information, and those 
based on 1941 information. Since it 
seemed probable that the accounts by 
the mothers were more accurate than 
those given 17 years later by the 
younger generation, the 1941 informa- 
tion was utilized wherever possible. 
The resulting indices were then cor- 
related with each set of test data, as 
shown in Table 2. Whatever the na- 
ture of the relationship between ances- 
tral index and intelligence, it appears 
that the main effects have occurred 
by the time of adolescence. The cor- 
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TABLE 2 
CoRRELATION OF IQs WITH 
ANCESTRAL INDEX 
(N = 98) 


Test 


_ 


1931 Combined Forms L, M 
1941 Form L 

1956 Form L 

1956 WAIS—-Full Scale 
1956 WAIS-Verbal 

1956 WAIS—Performance 


~ 


.86** 
.27° 

_" 
.05** 


86** 


wowndet 


® Standard error of the correlation is .1015 when N 
is 98. 

* Significant at .05 level. 

** Significant at .01 level. 


relation of the ancestral index with 
IQ did rt quite reach statistical sig- 
nificance (p = >.1) in 1931 (pre- 
school level); it was significant in 
1941 (adolescent level); it remained 
significant, and without appreciable 
change, in 1956 (adult level). The 
t tests of the differences between these 
related correlations were calculated 
according to a method suggested by 


McNemar (1955, p. 146). The differ- 
ence between correlations of the ances- 
tral index and test scores obtained in 


and 1941 reached a borderline 
of significance (p = .07), but 
none of the correlations obtained in 
1941 and 1956 differed significantly 
from each other. 


1931 


level 


DISCUSSION 


Our data indicate that mental 
growth continues into adulthood (as 
others have reported). More specifi- 
cally, however, we found that a de- 
celeration of mental growth rate from 
preschool to adolescence is not likely 
to be followed by further (permanent) 
changes in growth rate up to adult- 
hood. On the other hand, acceleration 
of mental growth rate from preschool 
to adolescence is more likely to be 
followed by deceleration (of a much 
smaller degree than the previous ac- 


M. ROBINSON 


celeration) than by further accelera- 
tion or by no change at all. 

The findings that the factors re- 
lated to the rate of mental growth 
had their greatest effect during the 
earlier years of life is consistent with 
findings of the several other longitudi- 
nal studies which have correlated var- 
ious parental and environmental fac- 
tors with variation in children’s IQ. 
For example, Bayley (1940) in the 
Berkeley Growth Study and Honzik 
(1940) in the Berkeley Guidance 
Study reported that the correlation 
between various economic and cul- 
tural indices in the home increased 
from 18 months to, but not beyond, 
various early childhood ages. 

This study was not, of course, ap- 
proached with the expectation of shed- 
ding light on the fast-extinguishing 
nature-nurture controversy. The hope 
was, however, to establish some prac- 
tical rules-of-thumb to aid the diag- 
nostician who is called upon to make 
an estimate of a child’s future mental 
status. While no precise recipe has 
been derived, these data seem to in- 
dicate that the psychologist who tests 
a preschool child would be wise to 
shade his estimate of future status 
somewhat in the direction of the gen- 
eral family level of intellectual ability. 
On the other hand, this would not 
hold for the junior high school coun- 
far, we have little which 
will improve his estimate beyond the 
IQ obtained under favorable testing 
conditions. The counselor’s chances of 
being reasonably accurate even with- 
out further aids, however, are 
considerable; as nated earlier, the 
correlation between the results of the 
administrations of Form L in 1941 
and 1956 is .85 for our group. 


selor. So 


such 


SUMMARY 


In 1956, Ferm L of the Stanford- 
Binet, and the WAIS, were adminis- 
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tered to 110 young adults who had 
previously been given the Stanford- 
Binet at preschool and adolescent 
ages. In 1941, those subjects whose 
IQs ‘had increased or decreased sig- 
nificantly since 1931 were selected for 
special attention. 

Present findings: 

1. For each group, regression to- 
ward the mean occurred between 1941 
and 1956, but the change in this 
period was small compared with the 
previous 10 years. 

2. Subjects whose IQs had increased 
between preschool and adolescence 
showed more reversal in their posi- 
tions in the distribution of the total 
sample than did those whose IQs had 
decreased. 

3. Significant changes in position in 
the distribution of IQs at preschool 
and the distribution of IQs of the same 
group at adolescence are related to 
ancestral intelligence; this does not 
hold true between adolescence and 
adulthood. 
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There is a growing interest in the 
screening of school beginners both as 
a means of determining the readiness 
for learning and as a way to spot chil- 
dren with potential difficulties early 
so that they can be given the required 
attention and help when it will be 
most beneficial. The Lee-Clark Read- 
ing Readiness Test (Lee & Clark, 
1951) and the Metropolitan Readiness 
Test (Hildreth & Griffith, 1946) are 
two well standardized and widely used 
screening tests. The present study ex- 
plores the usefulness of the Bender 
Gestalt test (Bender, 1938) as a 


screening tool for beginning first grade 


students. 


METHOD 


Subjects. The subjects were 272 beginning 
first grade students from 11 classes in seven 
different schools, none were repeating the 
first grade. The schools selected represent a 
socioeconomic cross section and are located 
in rural, semirural, suburban, and urban 
areas 

Procedure. During the first 6 weeks of the 
school year eight of the teachers adminis- 
tered the Lee-Clark Reading Readiness Test 
to their classes, while three other teachers 
gave the Metropolitan Readiness Test to 
their groups. During this same period each 
subject was seen individually by a psychol- 
ogist who administered the Bender Gestalt 
test. At the end of the school year the Met- 
ropolitan Achievement Test, Primary I Bat- 
tery, Form R, was administered to all 11 





*The authors wish to express their ap- 
preciation for the assistance provided by 
Sam Bonham and the staff of the Pupil Per- 
sonnel Department of Montgomery County, 
as well as by the teachers and principals of 
the schools included in this study. 
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by their respective teachers. The 
Readiness tests and the Achievement test 
were scored by the teachers following stand- 
ard procedure for each test. The Bender 
protocols were scored by the psychologists 
according to a system developed by the 
senior investigator for use with school chil- 
dren (Koppitz, 1958, 1960). 

Actual achievement as measured on the 
Total Average Achievement score of the 
Metropolitan Achievement Test was com- 
pared for each subject with his predicted 
achievement on each of the screening tests 
Predictive were derived from the 
grade equivalent scores on the Readiness 
tests and from standard deviation scores on 
the Bender. Pearson product-moment cor- 
relations were computed between the Read- 
iness tests and the Bender, and between the 
three screening tests and the Metropolitan 
Achievement Test 


classes 


scores 


RESULTS AND DISCUSSION 


Table 1 shows the correlations be- 
tween the various tests, all of quite 
similar magnitude and all statistically 
significant. Thus it appears that the 
Bender can predict Total Average 
Achievement as measured on the Met- 
ropolitan Achievement Test as well as 
the Lee-Clark and the Metropolitan 
Readiness Test. It is of course, real- 
ized that the Lee-Clark Reading Test 
is specifically designed to test reading 
and is not meant to predict total av- 
erage achievement. However, in dis- 
cussing the validity of the Reading 
Readiness test, Lee and Clark quote 
Henig (1949) who compared the 
Reading Readiness test with 
actual reading grades at the end of the 
school year and obtained a correlation 


scores 
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TABLE 1 
CORRELATIONS BETWEEN THE BENDER, 
ReaDINEss Tests, AND ACHIEVEMENT 
Test 


Lee-Clark 
& Bender* 


Lee-Clark &| Bender* & 


School | N | "Met. Ach. | Met. Ach 


—_ 67** 


53 aq .64** 
.42°° — .37** 30* 
.54** —.41** 21 
.40* — .58** 33 

24 eT — .61** 54** 


— 68** -— .§]°* 


199 .66** 


Met. Read 
Met. Ach Met. Ach 


Bender* & | Met. Read 
& Bender* 


71** | —.73°° 


F 31 .63°* - 3 
G 42 .66** — .29 —.41** 


— .59°* 


Total | 73 


.59** 5R8** 


* All correlations with the Bender are negative since 
the Bender is scored for errors. 

* Significant at .05 level. 

** Significant at .01 level. 


of .59 which is similar to the results 
obtained in this study. 

The discrepancies between the vari- 
ous predictions of achievement and the 
actual achievement of each subject re- 
vealed some differences between the 
three screening tests. It appears that 
the Lee-Clark test tends to overesti- 
mate achievement more often than the 
Bender and the Metropolitan Readi- 
ness Test. The latter two tests tend to 
underestimate achievement more of- 
ten. A follow-up investigation on those 
subjects who had marked discrepan- 
cies between their actual and pre- 
dicted achievement suggests that the 
Lee-Clark test is more strongly influ- 
enced by cultural and social factors, 


while the Bender reveals apparently 
more the potential ability in visual- 
motor perception which may or may 
not be fully developed and put to use. 
Further exploration of these tentative 
findings seems indicated. 


SUMMARY 


The Bender Gestalt test and the 
Lee-Clark Reading Readiness Test or 
the Metropolitan Readiness Test, re- 
spectively, were administered to 272 
beginning first grade students. Test 
scores were correlated with each other 
and with actual achievement at the end 
of the school year. It was found that 
the Bender correlates well with the 
Readiness tests and can predict actual 
achievement as well as they can. 
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It usually is assumed that what goes 
on among pupils in a classroom is de- 
pendent to a major degree upon the 
presence of a teacher—a teacher whose 
role is to provide leadership in “learn- 
ing” activities at which the pupils 
themselves are initiates, unprepared 
without assistance to determine ap- 
propriate goals or to select and employ 
suitable ways and means of attaining 
such goals. The teacher, by virtue of 
education and experience, is expected 
to be able to guide and assist his pupils 
in the acquisition of usable knowledge, 
understandings, skills, and group ap- 
proved attitudes and value systems. It 
is unthinkable, except in very rare 


cases, that individual children or 


groups of children entirely on their 


own might successfully pursue the 
kinds of objectives that permit ad- 
justment to the complex social and 
material worlds in which we live. 
Qualitatively, at least, the logic of 
this argument appears sound: i.e., pu- 
pil behavior is a function of teacher 
behavior, the teacher being a neces- 
sary, though not sufficient, condition 
for purposeful and productive pupil 
performance. Beyond this point we 
usually are less willing to express our- 
regarding the nature of the 
pupil behavior-teacher behavier rela- 
tionship. We like to think specified, 
distinguishable teacher acts or teacher 
behaviors produce, or at least con- 
tribute to, certain distinguishable pu- 


selves 


‘This research was made possible by a 
subvention from the Grant Foundation and 
was sponsored by the American Council on 
Education. It was reported at the 1960 meet- 
ings of the American Educational Research 
Association. 


pil acts or behaviors (Mitzel, 1960; 
Ryans, 1957, 1960). Some of us go on 
to postulate quantitative or functional 
relationships between kinds of teacher 
behavior and pupil behavior—assum- 
ing that more of a certain kind of 
teacher behavior will result in more 
effective pupil behavior—and proceed 
to gather data and evaluate the evi- 
dence (Anderson & Brewer, 1946; 
Christensen, 1960; Cogan, 1958; 
Gnagney, 1960; Jayne, 1945; Withall, 
1952; and others). However, such 
functional relationships are not easy 
to demonstrate empirically, partly be- 
cause of difficulties of observation and 
mensuration and, undoubtedly, partly 
because of the complex interactions 
among teacher and pupil character- 
istics (Ryans, 1956). Certainly re- 
search that might permit inferences 
regarding dependency relationships 
(producer-product relationships) , with 
classroom situations providing the set- 
ting, is difficult to arrange and carry 
through—largely due to limitations 
imposed by practicality. 

A correlational approach (less satis- 
factory than a dependency relations 
strategy for yielding the definitive 
kinds of answers a researcher often 
seeks, but nevertheless providing im- 
portant information about the exist- 
ence and degree of functional relation- 
ships) often poses much less of a data 
obtaining problem and, at the same 
time, provides an alternative avenue 
to clues and cues—in this case, re- 
garding the influence of teacher be- 
havior on pupil behavior. 

It was this second sort of relation- 
ship (the interdependency or correla- 
tional relationship) between “what 





PUPIL BEHAVIOR AND TEACHER CHARACTERISTICS 83 


teachers do” and “what pupils of the 
teacher do” with which the presently 
reported investigation was concerned: 
with the degree to which certain in- 
dexes of pupil behavior tend to be 
functionally related to, or to manifest 
variance in common with, certain in- 
dexes of teacher behavior. 


METHOD 

First of all, it is necessary to point 
out that the approach to pupil be- 
havior employed in this research was 
a direct one, involving the immediate 
observation and assessment of the be- 
havior of pupils. The pupils were ob- 
served in their regular classrooms in 
the presence of their teachers 
ers who also were observed and as- 


teach- 


sessed. 

Admittedly, this directly observed 
pupil behavior in process may be quite 
different from the products resulting 
from pupil behavior (Mitzel, 1960; 
Mitzel & Gross, 1956; Ryans, 1957, 
1960). Researchers in the area some- 
times indicate it is important 
and relevant to judge pupil behavior 
in terms of its products—the measur- 
able skills, understandings, and atti- 
tudes acquired by pupils—than merely 
to assess samples of ongoing pupil be- 
havior. After all, they point out, it 1s 
the end product we are teaching for. 

Now the method employed in the 
present research does involve an ap- 
parently reasonable, though certainly 
unproved, assumption: namely, that 
pupils who carry on their classroom 
activities in certain ways will acquire 
intended learnings more successfully 
than pupils who behave or act in other 
ways. But the fact remains that edu- 
cation is interested ultimately in pupil 
behavior and in teacher behavior for 
the results produced; immediate be- 
havior is of interest only insofar as it 
is a means to an end. So, one might 
contend that the pupil behavior with 


more 


which this paper deals is of only in- 
cidental concern. The author does not 
agree, but it is a point of methodology 
that should be made clear at the out- 
set. 

Next, it is appropriate to explain 
what kinds of directly observable pu- 
pil behavior the research dealt with. 

Obviously many specific behaviors, 
or acts, may be noted in a classroom 
even during a very limited sample of 
time. One thing assumed in this re- 
search (a basic postulate of trait the- 
ory that makes the researcher’s task 
tolerable) was that individual behav- 
iors possess some generality and there- 
fore may be classified in a fairly lim- 
ited number of classes or groups, each 
group being made up of related specific 
behaviors that have a common sub- 
stratum which identifies the group or 
class. Thus, a number of pupil be- 
haviors such as “rudeness to teacher 
and/or other pupils,” “interruption of 
one another,” “impatience,” “refusal 
to participate,” “quarrelsomeness,” 
“sullenness,” “disturbing noisiness,” 
ete. might all be considered as belong- 
ing to a class of behaviors we could, 
for convenience, label obstructiveness. 
Again, pupil behaviors such as “cour- 
teousness,” “friendliness,” attentive- 
ness to advice or criticism,” “diligence 
in completing study assignments,” 
“Initiation of constructive activity in 
the absence of specific direction from 
the teacher,” ete. could be thought of 
as belonging to a class of behaviors 
we might call responsibility. 

In this research we selected (after con- 
siderable study of previous classroom and 
“personality” research and after a good deal 
of trial and error involving classroom ob- 
servation and assessment) and focused our 
ittention on a limited number of “impor- 
tant” pupil behavior dimensions. The term 
“dimension” was used with intent, since in 
planning the research it seemed reasonable 
to attempt to identify and 
pupils in the classroom, first, in 


acts of 
light of 


ASSESS 
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their qualitative characteristics (the general 
class of behaviors to which a specific act be- 
longed) and then, according to their posi- 
tions on quantitative scales defined by sub- 
classes of behaviors falling at opposite poles 
of the assumed continua. Thus, one bipolar 
dimension of pupil behavior selected be- 
cause it was judged to be an important one 
in the classroom was the Apathetic-Alert 
dimension. Specific pupil behaviors that 
seemed to contribute to alertness (eg., “re- 
sponded eagerly,” “appeared anxious to re- 
cite and participate,” “watched teacher at- 
tentively,” etc.) defined one end of the 


continuum and pupil acts that appeared to 
contribute to apathy (“listless,” “restless,” 
“wandering attention,’ 
defined the other. 

In attempting to assess pupil behavior, 


“indifference,” etc.) 


trained observers employed several such di- 
mensions, assigning each class of pupils ob- 
served a value, on a scale extending from 1 
to 7, on each dimension. An assessment of 1 
(at the left pole) on the Apathetic-Alert 
dimension indicated an inference presum- 
ably based upon observation of a prepon- 
derance of apathetic-type behaviors among 
the pupils; a 7 (at the right pole) indicated 
an inference based upon the presumed ob- 
servation of many alert-type pupil behav- 
iors. A “Glossary,” which accompanied the 
assessment form and which was employed in 
the training of observers, helped to stand- 
ardize the assessment procedure 

Estimates of the reliability of assessment 
of the pupil behavior dimensions, based on 
correlations between independent assess- 
ments of two observers visiting the same 


TABLE 1 
LOADINGS OF OBSERVED DimeENSIONS OF Pupit BEHAVIOR AND TEACHER ON A 
Puri Benavior Factor: SEconDARY ScHOOL CLASSES 


Classroom Behavior Dimension 


Pupil Behavior: 
Apathetic-Alert 
Obstructive-Responsible 
Uncertain-Confident 
Dependent-Initiating 

Teacher Behavior:* 
Partial-Fair 
Autocratic-Democratic 
Aloof (G)-Responsive 
Aloof (1)-Responsive 
Restricted-Understanding 
Harsh-Kindly 
Dull-Stimulating 
Stereotyped-Original 
Apathetic-Alert 
Unimpressive-Attractive 
Inarticulate-Articulate 
Monotonous (V)-Pleasant 
Evading-Responsible 
Erratic-Steady 
Excitable-Poised 
Uncertain-Confident 
Disorganized-Systematic 
Inflexible-Adaptable 
Pessimistic-Optimistic 
Immature-Integrated 
Narrow-Broad 


(From Ryans and Wandt, 1952) 


Oblique Factor 


( D 


00 
19 
20 
— .06 


Note.—N = 249 senior high school mathematics, science, English, and social studies classes 
* Loadings of teacher behavior dimensions are omitted here for factors on which the loadings of pupil behavior di- 


mensions were not pronounced 
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classes at different times ranged from 43 to 
65 for 1518 elementary school classes ob- 
served, and from .43 to 63 for 1,911 second- 
ary school classes. 

Teacher behavior, observed and assessed 
following the same procedure as that de- 
scribed for pupil behavior, was considered in 
light of dimensions such as Harsh-Kindly, 
Disorganized-Systematic, Dull-Stimulating, 
etc. (see Tables 1 and 2 below). Reliability 
estimates of the assessments of the several 
dimensions of observed teacher behavior, 
based on correlations of the assessments of 
the first observer and second observer, gen- 
erally were between 50 and 60—for ele- 
mentary teachers coefficients for the differ- 
ent dimensions clustered around 55, and for 
secondary school teachers, around 60 

A minimum of two independent sets of 
assessments was obtained for each class of 


pupils and for each teacher, the separate 
ratings subsequently being weighted equally 
(by transformation to a scale employing a 
common mean and standard deviation) and 
combined to form composite assessments for 
each teacher and each class of pupils on each 
of the bipolar dimensions considered. 

The teachers and classes of pupils upon 
which data of this report are based, gener- 
ally, were from the school systems of fairly 
large cities. Geographically, the midwest and 
west accounted for a majority of the school 
systems sampled (93%). The classes repre- 
sented 274 elementary schools and 103 sec- 
ondary schools. Because of dependence upon 
the voluntary cooperation both of admin- 
istrative offices responsible for school sys- 
tems and of individual teachers, the sample 
was not selected so as to constitute a known 
probability sample. 


TABLE 2 
LoapINGs OF OBSERVED DimeNsIoNns OF Puri, BEHAVIOR AND TEACHER BEHAVIOR ON 
Two Facrors INVOLVING, In Part, ‘“‘Puprt PARTICIPATION” AND ‘‘CONTROLLED 
Pupit Activity’’: ELemMEenTARY ScHOOL CLASSES 
(From Ryans, 1952) 


Classroom Behavior Dimension 


Pupil Behavior: 
Disinterested-Alert 
Obstructive-Constructive 
Restrained -Participating 
Rude-Self-Controlled 
Apathetic-Initiating 
Dependent-Responsible 

Teacher Behavior:* 
Partial-Fair 
Autocratic-Democratic 
Aloof (G)-Responsive 
Restricted-Understanding 
Unattractive-Attractive 
Disorganized-Systematic 
Inarticulate-Fluent 
Inflexible-Adaptable 
Harsh-Kindly 
Apathetic-Alert 
Aloof (I)-Responsive 
Stereotyped-Original 
Changeable-Constant 
Excitable-Calm 
Uncertain-Confident 
Irresponsible-Responsible 
Pessimistic-Optimistic 
Infantile-Mature 


Oblique Factor 





Note.—N = 275 third and fourth grade elementary school classes 
® Loadings of teacher behavior dimensions are omitted here for factors on which the loadings of pupil behavior 


dimensions were not pronounced 
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RESULTS 
Estimates of pupil behavior and of 
teacher behavior, obtained by the di- 
rect observation procedures described 
above, were subjected to correlational 
analyses. It should be noted that the 
findings, therefore, have to do simply 
with concomitance and permit infer- 
ences of interdependency relationships 
only. Inferences regarding “anteced- 
ent-consequent” or “producer-prod- 
uct” relationships may be suggested 
by such data, but the design employed 
precludes the testing of dependency 

relationship hypotheses. 


Relationships among the Pupil and 
Teacher Behavior Dimensions 

One set of findings to be reported 
has to do with the extent to which the 
observed pupil behavior dimensions 
were related to the concurrently ob- 
served teacher behavior dimensions. 

Here, in fairness to inferences that 
may be drawn, it should be noted 
that the pupil behaviors of the class of 
a given teacher were observed and as- 
sessed during the same session, or pe- 
riod, during which the teacher of that 
class was observed and assessed. Ob- 
viously there well may be contamina- 
tion of one set of assessments by the 
other. (It would have been possible to 
at least partially correct for this, but 
it also would have added materially 
to both dollar and time expenditures; 
consequently, the choice was to recog- 
nize this source of error variance, but 
not to attempt to control it.) 

In reviewing the pupil behavior- 
teacher behavior relationship findings 
the results for elementary teachers and 
secondary teachers will be presented 
separately; inferences that may be 
drawn are somewhat different. The in- 
teraction of grade level with relation- 
ships of this sort appears to be pro- 
nounced. 

Relationships in 


the secondary 


school. In the secondary school (Ryans 
& Wandt, 1952), all of the pupil be- 
havior dimensions considered in this 
research tended to be interrelated— 
the intercorrelations ranging from .42 
to .68. They tended to hang together 
as a single cluster, all contributing to 
the same oblique factor when the in- 
tercorrelations of pupil behavior and 
teacher dimensions were factor ana- 
lyzed. Table 1 shows relevant factor 
loadings. It may be noted from Table 
1 that the factor defined by the dis- 
tinctly homogeneous cluster of pupil 
behavior dimensions was not signi- 
ficantly contributed to by the teacher 
dimensions with 
one exception (the Dull-Stimulating 
teacher behavior dimension). Of all 
the teacher behavior dimensions, only 
that which had to do with the extent 
to which a teacher was judged Dull- 
Stimulating seemed to be closely as- 
sociated with the pupil behavior di- 
mension cluster. 

In the secondary school, pupil be- 
havior (as here defined and assessed) 
seemed not to be as closely related to 
teacher behavior as we sometimes as- 


behav ior assessed, 


sume it to be 

Relationships in_ the 
school. In the elementary 
(Ryans, 1952) we note a distinctly 
different situation. Table 2 presents 
results comparable to those shown for 
the secondary school in the preceding 
table. With the elementary 
data, however, the tight clustering of 
pupil behavior dimensions, relatively 
independent of teacher behavior di- 
mensions (noted in the high school), 
is absent. Instead, the pupil behavior 
dimensions investigated—with inter- 
correlations ranging from .21 to .65— 
split up and fell into one or the other 
of two loosely correlated (.08) factors, 
or groups, of combined pupil behavior 
and teacher behavior dimensions. 

In elementary school classrooms, the 


elementary 
school 


school 
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loadings of the pupil behavior dimen- 
sions thus fell on two somewhat dif- 
ferent factors, rather than on a single 
one, as was the case when secondary 
school classrooms were considered. 
The pupil behavior dimensions con- 
tributing to the first factor appeared 
to be ones having to do with pupil 
particination. And related to this set 
of pupil dimensions were those teacher 
behavior dimensions that had been 
labeled Stereotyped-Original, Auto- 
cratic-Democratic and _ Inflexible- 
Adaptable—traits not entirely unlike 
the Dull-Stimulating teacher behavior 
dimension in the secondary school 
analysis. These kinds of relationships 
appear to make 
well as statistical. 
The pupil behavior dimensions, 
which as a group might be referred to 
as controlled pupil activity, were 
loaded on a second factor, which also 
included such teacher behavior di- 
mensions as Irresponsible (evading) - 


sense—common, as 


Responsible, Disorganized-Systematic, 


Changeable-Constant, Infantile-Ma- 
ture, and Uncertain-Confident— 
teacher characteristics that suggest 
disorganized, irresponsible vs. busi- 
nesslike, responsible teacher behavior. 

In the elementary school, then, pupil 
behavior and teacher behavior seemed 
to be more noticeably interdependent 
than in the secondary school, with 
participating pupil behavior seeming 
to be related to flexible, original, dem- 
ocratic teacher behavior and 
trolled pupil behavior to systematic, 
responsible teacher behavior. 

It also may be noted parenthetically 
that the analyses undertaken in con- 
nection with this research yielded evi- 
dence suggestive of a moderate posi- 
tive relationship between the factors 
contributed to by the pupil behavior 
dimensions and those teacher behavior 
dimensions that loaded on a factor 
which seems to refer to “the way the 


con- 


teacher appears” to his or her class 
(teacher behavior dimensions such as 
Monotonous-Pleasant (voice), Unim- 
pressive-Attractive, Inarticulate-Ar- 
ticulate, and Uncertain-Confident con- 
tributed to this teacher appearance 
pattern). This sort of relationship be- 
tween pupil behavior and teacher ap- 
pearance seemed to hold to some 
degree among both elementary and 
secondary school classes. In the sec- 
ondary school, the median correlation 
between the teacher behavior dimen- 
sions that contributed to teacher ap- 
pearance and the pupil behavior di- 
mensions was .27; and the correlation 
between the teacher appearance factor 
and the factor dominated by pupil be- 
havior (Factor C of Table 1) was .33. 
In the elementary school, the median 
correlation between assessments on the 
teacher behavior dimensions that re- 
lated to teacher appearance and as- 
sessments of the pupil behavior di- 
mensions was .37; and the correlations 
between the teacher appearance factor 
and the factors contributed to by the 
pupil behavior dimensions (Factors A 
and B of Table 2) were .57 and .23, 
respectively. 


Relationships Based on an Overall In- 
dex of Pupil Behavior 


Now, to turn to another related as- 
pect of the analyses which supports 
the foregoing findings, although com- 
ing at the problem in a slightly differ- 
ent manner: 

In an extension of the studies just 
discussed, but with new independent 
samples of teachers and classes, an in- 
dex (based on the combination of as- 
sessments for teacher behavior dimen- 
sions that loaded most highly on 
factors generated by the earlier factor 
analyses—Ryans, 1952, and Ryans & 
Wandt, 1952)? was obtained for each 

each 
trans- 


of each observer on 


dimension 


® Assessments 


teacher behavior were 
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TABLE 3 
CoRRELATIONS BETWEEN THE Pupit BEHAVIOR INDEX AND TEACHER BEHAVIOR PATTERNS 


Sample 
Elementary school classes 
834 Grades 1-6 classes 
144 Grades 1-6 classes 
Secondary school classes 
497 Mathematics and science classes 
568 English and social studies classes 
114 Mathematics, science, English, and 
social studies classes 


Note.—P» = pupil behavior (composite 


r r 


=m PY, 


Xo = kindly, understanding vs. aloof, restricted teacher behavior 
Yo = responsible, businesslike vs. evading, unplanned teacher behavior 
Ze = stimulating, imaginative vs. dull, routine teacher behavior 


teacher on each of three prominent 
patterns of teacher behavior, and the 
teacher behavior pattern scores then 
considered in relation to pupil be- 
havior indexes (a composite score for 
each class—P,—obtained from the as- 
sessments assigned the pupil behavior 
dimensions)* of the teachers’ classes. 
The teacher behavior patterns con- 
sidered were ones suggested by the 
varlier factor analyses: (a) aloof, re- 
stricted, egocentric vs. kindly, under- 
standing, warm teacher behavior; (6) 
evading, unplanned, slipshod vs. re- 
sponsible, systematic, businesslike 
teacher behavior; and (c) dull, rou- 


formed into standard scores. Then for each 
pattern of teacher behavior (X., Yo, Ze) 
the standard scores of the several compo- 
nents were summed to provide the three 
pattern scores—based on the observer’s as- 
sessments of a particular teacher.. These pat- 
tern scores were, in turn, transformed into 
standard scores for each observer; and 
standard scores for independent observers 
were combined to yield a composite assess- 
ment on each of the behavior patterns for 
each teacher 

The pupil behavior index was obtained 
simply by (a) summing for each class, each 
observer’s assessments (previously trans- 
formed to standard scores) on the several 
pupil behavior dimensions and (b) comput- 
ing the mean of the equally weighted indexes 
contributed by the several observers who 
assessed a particular class of pupils 


tine vs. stimulating, imaginative 
teacher behavior. 

Xeliabilities of the indexes (scores) 
for the patterns of teacher behavior 
just named (based on correlations be- 
tween the indexes vielded by assess- 
ments of different observers who inde- 
pendently assessed the same teachers) 
ranged from .70 to .80. Reliability 
estimates of the pupil behavior index 
were slightly lower—e.g., for one sam- 
ple of 98 classes, when the pupil be- 
havior dimensions were separately 
summed for two observers and the two 
sets of resulting P, values correlated, 
reliability coefficients of .62, .84, .72, 
and .64 were obtained, respectively, 
when the observations were made ap- 
proximately 30 minutes apart, more 
than 30 minutes apart but during the 
same half-day, during different halves 
of the day (forenoon or afternoon) 
but the same day, and at least one 
month apart but during the same 
school year. 

Correlations between pupil behavior 
and patterns of teacher behavior based 
on approximately 1,000 elementary 
school classes and a like number of 
secondary school classes are presented 
in Table 3. 

Secondary school relationships. 
When the teacher behavior pattern 
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scores were correlated with the pupil 
behavior index for secondary school 
classes of mathematics, science, Eng- 
lish, and social studies, the evidence 
suggested low positive relationships 
between pupil behavior and teacher 
behavior, the obtained correlation co- 
efficients between the pupil behavior 
index and the three teacher behavior 
pattern scores being between .07 and 
.26. It probably may be inferred that 
the relationship is not pronounced in 
secondary school classes. 

Elementary school _ relationships. 
With elementary school classes, the 
pupil behavior index was found to be 
uniformly highly correlated with all 
three teacher behavior pattern scores 
(again, as suggested by the analyses 
reported in the preceding sections of 
this paper), the correlation coeffi- 
cients ranging from .75 to .83. (It may 
be observed that these pupil behavior- 
teacher behavior correlations in the 
elementary school were as high as the 
reliability coefficients for the several 
pattern scores.) 

Thus, for the kinds of teacher be- 
havior and pupil behavior described, 
the two separate analyses support the 
same conclusion: i.e., that teacher be- 
havior and pupil behavior show sub- 
stantially more interdependence in the 
elementary school as compared with 
the secondary. There also is the sug- 
gestion that of the teacher behavior 
dimensions and patterns studied, Dull- 
Stimulating teacher behavior may be 
more closely associated with second- 
ary school pupil behavior. 

Such findings, it should be recalled, 
are based upon correlational analysis, 
permitting interdependency, but not 
dependency (producer-product), in- 
ferences about the relationships in- 
volved. Furthermore, the relationships 
reported refer to group data—to 
groups of teachers and their classes 
and the probability risks involved in 


applying conclusions to specific situa- 
tions should be kept in mind. 


SUMMARY 


telationships between trained ob- 
servers assessments of (a) classes 
of pupils and (b) the teachers of 
those classes, relative to selected di- 
mensions of pupil and teacher class- 
room behavior, were studied. 

For elementary school classes, high 
positive relationships were noted be- 
tween observers’ assessments of “pro- 
ductive pupil behavior” (e.g., assess- 
ments presumed to reflect pupil 
alertness, participation, confidence, 
responsibility and self-control, initiat- 
ing behavior, etc.) and observers’ as- 
sessments of previously identified 
patterns of teacher behavior which 
seemed to refer to understanding, 
friendly classroom behavior; organ- 
ized, businesslike classroom behavior; 
and stimulating, original classroom 
behavior. 

For secondary school classes, low 
positive relationships appeared to ob- 
tain between productive pupil behav- 
ior and the above named categories 
of teacher behavior, with a tendency 
for the stimulating, original teacher 
classroom behavior pattern to show 
a slightly higher correlation with pupil 
behavior than the understanding, 
friendly or the organized, businesslike 
teacher behavior patterns. 

The approach was correlational, 
permitting inferences of interdepend- 
ency, but not producer-product, rela- 
tionships. 
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In an earlier report, Ryans (1961) 
presented classroom observation data 
indicative of: 
school classes, high positive relation- 
ships between observers’ assessments 


(a) among elementary 


of “productive pupil behavior” (e.g., 
assessments presumed to reflect pupil 
alertness, participation, confidence, re- 
sponsibility and self-control, initiating 
behavior, etc.) 
ments of previously identified patterns 
of teacher behavior which seemed to 
refer to understanding, friendly class- 
room behavior; organized, businesslike 
classroom behavior; and stimulating, 
original classroom behavior; and (b) 


and observers’ assess- 


among secondary school classes, low 


positive relationships between produc- 


tive pupil behavior and the above 
named categories of teacher behavior, 
with a tendency for the stimulating, 
original teacher behavior 
pattern to show a slightly hig 
relation with pupil behavior than the 
understanding, friendly or organized, 
businesslike teacher behavior patterns. 

If, as the evidence suggests, inter- 
dependency relationships between 
overt teacher behavior and overt pupil 
behavior in the classroom are demon- 


classroom 
her cor- 


strable, a question which next is sug- 
re- 
person il 


gested is whether or not similar 
lationships between 
characteristics of the 
vealed by self-report inventory scores 


and overt pupil behavior in teachers’ 


exist 


teacher as 


This ude possible by a 
subvention from the Grant Foundation and 
was sponsored by the American Council on 
Education. It was reported at the 1960 meet- 
American Psychological Associa- 


researt h was mi 


ings of the 


tion 
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-whether or not inventory es- 
timated teacher traits correlate with, 
and therefore may be used to predict, 
pupil classroom behavior. It was to 
this problem the present research was 
addressed. 

Two hypotheses derived from earlier 
(Ryans, 1961) were 


classes— 


studies con- 
sidered: 

1. Certain teaching oriented, trait 
(or “characteristic”) scores yielded by 
teachers’ responses to a self-report in- 
ventory, the Teacher Characteristics 
Schedule, covary in a nonrandom man- 
ner with indexes of observer assessed, 
overt pupil behavior in the teachers’ 
classes, permitting better-than-chance 
prediction of one from the other. 

2. Teacher characteristic-pupil be- 
havior relationships are more notable 
in elementary than in secondary school 


classes. 


METHOD 
Pup l Be har wor Data 


The classroom activities of pupils 
were directly observed by trained and 
experienced observers. The direct ob- 
servation approach involved a reason- 
ible, though certainly unproved, as- 
sumption: namely, that pupils who 
‘arry on their classroom activities in 
certain ways will acquire the intended 
“learning” (i.e., knowledge, under- 
standin attitudes) more success- 
fully than pupils who behave in less 


cs 


seemingly productive manners 

It is important to note the particu- 
lar kinds of pupil behavior that were 
observed and assessed—thus providing 
definition of pupil 


an operational 
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classroom behavior as considered in 
this research. Obviously, the number 
of specific pupil behaviors occurring 
in a classroom is almost unlimited. It 
was necessary, therefore, to make the 
usual assumption that such specific be- 
haviors possess some generality and 
that they may be classified in a fairly 
limited number of groups or classes, 
ach group being made up of related 
behaviors possessing a common sub- 
stratum which identifies the behavior 
group. Thus, specific pupil acts involv- 
ing “rudeness to teacher and/or other 
pupils,” “interruption of one another,” 
“refusal to participate,” “quarrelsome- 
ness,” “disturbing noisiness,” and the 
like might reasonably be thought of as 
belonging to a class of behaviors la- 
beled, for convenience, obstructive- 
ness. Or, specific pupil behaviors 
involving “courteousness,” “friendli- 
ness,” “attentiveness to advice or criti- 
cism,” “diligence in completing study 
assignments,” “initiation of construc- 
tive activity on one’s own,” and the 
like, might be grouped together in a 
class of behaviors we might call re- 
sponsibility. 


The groups or categories of pupil be- 
havior selected for consideration were de- 
termined in light of review and analysis of 
previous classroom and personality research 
and after extensive pilot classroom obser- 
vation and assessment during preliminary 
phases of the study. 

For assessment purposes, the major pupil 
behavior categories to be considered were 
cast in the form of dimensions, thus, per- 
mitting the identification and assessment of 
observed pupil behavior (a) in the light of 
its essential qualitative characteristic (the 
general class of behaviors to which a spe- 
cific act belongs) and (b) according to its po- 
sition on a quantitative scale defined by sub- 
classes of behavior falling at opposite poles 
of the assumed continua. Thus, one bipolar 
dimension of pupil behavior that observers 
were trained to assess was an Apathetic-Alert 
dimension. Pupil acts or behaviors that 
seemed to contribute to alertness (e.g., “re- 
sponded eagerly,” “appeared anxious to re- 
cite and participate,” etc.) defined one pole 


of the dimension and behaviors contributing 
to apathy (eg., “listlessness,” “restlessness,’ 
“wandering attention,” etc.) defined the 
other. 

Four such dimensions of pupil behavior 
were observed and assessed: Apathetic-Alert, 
Obstructive-Responsible, Uncertain-Confi- 
dent, and Dependent-Initiating? 

In assessing pupil behavior, trained ob- 
servers assigned each class of pupils ob- 
served a value, on a scale extending from 
1 to 7, on each dimension. An assessment of 
1 (at the left pole of the scale) on the Apa- 
thetic-Alert dimension indicated presumed 
observation of a preponderance of apathetic 
behaviors; an assessment of 7 (at the right 
pole) indicated presumed observation of 
many alert-type behaviors. Assessments were 
made for intact classes—for classes as a 
whole. A “Glossary,” which accompanied the 
assessment form and which was employed in 
the training of observers, helped to stand- 
ardize the assessment procedure 

Reliability coefficients for assessments of 
the several pupil behavior dimensions, based 
on correlation of independent assessments of 
first observer and second observer, ranged 
from 43 to 65 for some 1,500 elementary 
school classes and from 43 to 63 for ap- 
proximately 1,900 secondary school classes 

Intercorrelations among the pupil behav- 
ior dimensions were substantial, ranging 
from 42 to 68 for secondary school classes 
and from 21 to 65 for elementary school 
classes. (Factorially, the pupil behavior di- 
mensions for secondary school classes loaded 
on a single factor; for elementary school 
classes they contributed to two correlated 
factors, appearing to be roughly describable 
as pupil participation and controlled pupil 
activity.) 

A minimum of two independent sets of 
assessments was obtained for each class (dif- 
ferent observers observing the class at dif- 
ferent times), the ratings of different ob- 
servers subsequently being weighted equally 
(by transformation to a scale employing a 
common mean and standard deviation) and 
averaged to provide a composite assessment 





*It may be noted that none of these pupil 
behavior dimensions directly reflects pupil 


learning—leastwise, not the learning of 
knowledge, understandings, and skills toward 
which much of the school’s effort is directed 
But most classroom practices do assume ac- 
tive attentiveness, purpcseful and systematic 
study, reasoned self-assurance, and an in- 
quiring and initiating mind to be conditions 
of the pupil which are conducive to learning 





TEACHER CHARACTERISTICS AND PUPIL BEHAVIOR 93 


of each class of pupils on each of the bipolar 
dimensions considered. 

A pupil behavior index, Po, 
of pupils was obtained by combining the 
averaged assessments on all four pupil be- 
havior dimensions, equally weighted, into a 
single value. (Combination of the several di- 
mensions into a single index seemed reason- 
able from a practical standpoint—as well as 
in light of their intercorrelations, which, as 
noted above, suggested a single secondary 
school cluster and two by no means distinct 
elementary clusters.) It was with this over- 
all index, Po, that the inventory estimated 
teacher characteristics subsequently 
correlated 

The reliability of the pupil behavior in- 
dex (the overall estimate contributed to by 
the four dimensions) was moderate. In one 
analysis (based on 98 classrooms) in which 
the pupil behavior dimension assessments of 
two observers were separately summed and 
the resulting P. values correlated, the inter- 
observer correlations were 62, 84, .72, and 
64, respectively, when the observations were 
made approximately 30 minutes apart, more 
than 30 minutes apart but within the same 
half-day, during the same day but different 
half-day, and during the same school year 
but with an interval of than one 
month. 


for each class 


were 


more 


Teacher Characteristics Data 


The Teacher Characteristics Schedule, 
employed to obtain estimates of teacher 
“traits” as described below, was an omnibus 
self-report type of inventory made up of 
items selected from a number of specially 
prepared separate instruments which had 
been subjected to a series of preliminary re- 
sponse-selection and validation studies. In 
its final form the schedule consisted of 300 
multiple choice and check-list items relating 
to personal preferences, self-judgments, fre- 
quently engaged-in activities, biographical 
data, and the like. Ten scores (the last one, 
a score on a control variable used only to at- 
tempt to identify individuals with a strong 
tendency to make “socially acceptable” re- 
sponses) were obtainable with the use of the 
Teacher Characteristics Schedule. Responses 
to the schedule had been selected and cross- 
validated (Ryans, 1960) to estimate teacher 
characteristics which were identified as: 

warm, understanding, friendly vs. aloof, 
egocentric, restricted classroom behavior 

responsible, businesslike, organized vs. 
evading, unplanned, slipshod classroom be- 
havior 


stimulating, imaginative vs. dull, rou- 
tine classroom behavior 
favorable vs. unfavorable 
pupils 
favorable vs. unfavorable opinions of 
democratic classroom procedures 
favorable vs. unfavorable opinions of 
administrative and other school personnel 
learning centered (“traditional” or con- 
servative) vs. child centered (“permis- 
sive” or liberal) educational viewpoints 
superior verbal understanding (compre- 
hension) vs. poor verbal understanding 
emotional stability (adjustment or ma- 
turity) vs. instability 
validity vs. invalidity of 
control variable ) 

Generally, the reliability coefficients (sta- 
bility and equivalence estimates) for scores 
derived from the Teacher Characteristics 
Schedule fell between .70 and 80. Validity 
coefficients (correlations schedule 
scores and [fa] observers’ assessments of 
teacher classroom behavior, or [b] direct 
inquiry-type items relating to certain con- 
structs or traits, e.g., attitudes, verbal ability, 
etc.) were of varying magnitude, depending 
upon the aspect of validity investigated 
(cross-validity, validity generalization, va- 
lidity extension, concurrent validity, predic- 
tive validity), the particular teacher charac- 
teristic estimated, and the teacher sample 
(e.g., elementary teachers, secondary teach- 
ers, mathematics-science teachers, etc.) for 
which a particular scoring key was derived 
and to which it might most appropriately be 
applied. Concurrent validity coefficients typi- 
cally ranged from .20 to 50 and cross-valid- 
ity coefficients between 40 and 60. Coeffi- 
cients of predictive validity were low, 
exceeding .20 only in two or three instances 


opinions of 


response (a 


between 


Sampling and Analysis 


Data from over 2,000 elementary and sec- 
ondary school classes, widely scattered geo- 
graphically, were collected and analyzed 
For the most part the classes represented 
school systems of relatively large cities 
Since cooperation of school systems, schools, 
and teachers was of necessity voluntary, the 
sampling could not comply with conditions 
theoretically required for comparison of the 
data with known probability models 

The teacher characteristics-pupil behavior 
data were analyzed separately for seven sub- 
samples, three from elementary schools and 
four from secondary schools. The sizes of 
the subsamples varied, as shown in Table 
1 below, from 99 to 718 classes of pupils and 
teachers. In each sample, and for each of the 
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TABLE 1 
Propuct-MoMENT CoRRELATION COEFFICIENTS BETWEEN ‘‘OBSERVER ASSESSED’ 


Poplin. 


BEHAVIOR AND CERTAIN “INVENTORY EsTIMATED’’ TEACHER CHARACTERISTICS 


Teacher Characteristic (with 
which “Observed’’ Pupil 


, - a . 

“Inventory Estimated } Elementary School Samples* Secondary School Samples” 
| 
| 

Behavior, Po,is compared) } 


B ’ D } , G 


Sex of teacher: M & F Female Male M&F M&F 
No. of classes: | 144 $10 441 114 | 99 


Understanding teacher a : ; 11* 07 .02 
behavior 
Businesslike teacher 36 2s A : 00 .16 
behavior 
Stimulating teacher 
behavior 
Favorable attitude to- ‘ f : 03 
ward pupils 
Favorable attitude 
toward democratic 
classroom 
Favorable attitude to 
ward administrators 
and colleagues 
Conservative educa- | —.14* 24* | —.09 03 — .07 01 -.14 
tional viewpoints* 
Verbal comprehension 07 .06 11 03 10* .12 — .03 
Emotional stability 13* .04 16* 08 11* 14 09 | —.02 
Validity of response — .07 .00 09 02 08 — 01 02 | —.11 


20* 


* Asterisk used to note coefficients that differ significantly (.05 level or beyond) from .00 

® Samples A and C comprised of classes of Grades 1-6 pupils and their teachers; Sample B, of classes of Grades 3-6 
I upils and their teachers 

> Samples D, E, F, and G comprised of classes of secondary school students and their teachers (roughly 22% mathe 
matics classes, 257% science classes, 30% English classes, and 23% social studies classes); Sample H, of classes of foreign 
language students and their teachers 

° A high score on this characteristic reflects conservative, learning centered, educational viewpoints and a low score 
iberal, child centered viewpoints. Thus, a negative correlation between this teacher characteristic and Ps may be 
interpreted as indicative of a positive relationship between liberal educational viewpoints held by the teacher and pro- 


ductive pupil behavior in the teacher’s class 


teacher characteristics studied, P, of each like teacher behavior; and original, 
class was paired with the teacher's score on stimulating teacher behavior. 
the characteristic under consideration, end Although the remaining correlation 
the product-moment correlation coefficient eg Pees ae 
coefficients are less uniformly signifi- 
cant from sample to sample, the evi- 
RESULTS dence seems to point to a low positive 
a a a a relationship between pupil behavior 
hree elementary school samples sug- nn Se sehen reemgy te Sather 
three ¢ Aeteer pies SUB- characteristics we have identified as 
gest moderate, statistically significant favorable attitude toward democratic 
(.05 level or beyond) interdependency  ¢jassroom procedures and liberal or 
relationships between assessments child centered (permissive) educa- 
of observed purposeful and produc- tional viewpoints, and probably a sim- 
tive pupil behavior and inventory jlar low-order relationship between 
estimated understanding, friendly pupil behavior and favorable attitude 
teacher behavior; organized, business- toward pupils and emotional stability. 


com pute d 
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Samples A and C, involving female, 
and male and female teachers, respec- 
tively, both yield low positive correla- 
between pupil behavior and 
teachers’ favorable attitude toward 
administerators and colleagues, but 
the coefficient for Sample B, male 
teachers, is of zero order. 

With the notable exception of verbal 
intelligence all of the correlations be- 
tween inventory estimates of teacher 
characteristics and observed pupil be- 
havior were statistically significant in 
Sample A, the large sample of ele- 
mentary taught by female 
teachers. 

It seems likely that classroom pupil 
behavior is related to a number of 
teacher characteristics, of the kinds 
here estimated, when study is re- 
stricted to elementary school classes. 

Secondary classes. The secondary 
school data suggest substantially fewer 
statistically significant relationships 


tions 


classes 


between the assessments of productive 
pupil behavior, as observed and as- 


sessed, and inventory estimated char- 
acteristics of teachers as measured by 
the Teacher Characteristics Schedule. 
It would appear that the only gen- 
eralizable relationships are those rep- 
resented by low correlations between 
observed pupil behavior and inventory 
estimates of original, stimulating 
teacher behavior—and, perhaps, be- 
tween the pupil behavior index and 
the teacher characteristic interpreted 
as organized, businesslike classroom 
manner. 

Comparison of elementary and sec- 
ondary school relationships. As had 
been hypothesized from previously ob- 
tained evidence of correlations be- 
tween assessments of directly observed 
pupil behavior and directly observed 
teacher behavior, interdependency re- 
lationships between pupil behavior 
and inventory estimated teacher char- 
acteristics were found to be distinctly 


more apparent in elementary school 
classes and less discernible in second- 
ary school classes. 

For purposes of comparison of the 
pupil behavior-teacher characteristics 
correlations based on elementary vs. 
secondary school classes, consideration 
of the three elementary samples to- 
gether and the five secondary samples 
together appeared justifiable. (For 
each of the inventory estimated 
teacher characteristics, application of 
the appropriate chi square test for 
the z, transformations showed the 
values of the correlation coefficients 
between observed pupil behavior and 
the teacher characteristic under con- 
sideration to be no less homogeneous 
than might be expected from random 
sampling among the three elementary 
school samples and also among the 
five secondary school samples.) 

When the pupil behavior-teacher 
characteristic correlation coefficients 
of the three elementary samples are 
translated into z, values and combined 
to obtain the mean z,, and this mean 
compared with the mean z, value ob- 
tained by combining the five second- 
ary. school samples, the hypothesis 
that correlations between pupil be- 
havior and teacher characteristics are 
more visible in the elementary as com- 
pared with the secondary school is 
supported—at least for the teacher 
characteristics referring to warm, un- 
derstanding, friendly behavior; or- 
ganized, businesslike behavior; stimu- 
lating, imaginative behavior; fa- 
vorable opinions of pupils; favora- 
ble opinions of democratic classroom 
procedures; and child centered or per- 
missive (liberal) educational view- 
points. The mean elementary z, value 
was significantly higher (.05 level or 
beyond; two-tailed test) than the 
mean secondary z, value for the in- 
ventory estimated teacher characteris- 


tics named. (Had the .10 level been 
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accepted as the criterion of signifi- 
cance, the pupil behavior-teacher emo- 
tional stability correlation also would 
have been significantly higher for the 
elementary compared with secondary 
school classes.) 

It may be noted that tests of sig- 
nificance of the differences between z, 
values for the elementary Sample A 
and secondary Sample D, both based 
on female teachers and their classes, 
also revealed significantly higher dif- 
ferences (.05 level or beyond) for the 
same teacher characteristics listed in 
the immediately preceding paragraph 
—with the pupil behavior-teacher 
emotional stability correlation also 
significantly higher in elementary 
classes if the .10 level of significance 
were employed. When male teachers 
and their classes were compared, ele- 
mentary Sample B and secondary 
Sample E, none of the differences 
between z, values attained the .05 
significance level—although the dif- 
ferences between the correlations in- 
volving the understanding, friendly; 
responsible, businesslike; and child 
centered educational viewpoints on the 
part of teachers would have been sig- 
nificant had the .10 level been em- 
ployed. Only two of the z, differences 
—those involving understanding, 
friendly and responsible, businesslike 
teacher characteristics—met the .05 
criterion when the relatively smaller 
independent samples of male and fe- 
male teachers and their classes, Sam- 
ples C and F, were compared—the dif- 
ference of .23 between the elementary 
and secondary pupil behavior-teacher 
emotional stability z, values being the 
next largest difference, but this signifi- 
eant only had the .10 level been ac- 
cepted. 

Comparison of correlations involv- 
ing male and female teachers. As a fi- 
nal note, added in light of frequently 
expressed interest in the sex of the 


teacher as a variable in the teacher- 
pupil relationship, it may be noted 
that the pupil behavior-teacher char- 
acteristics correlation coefficients ob- 
tained from samples of classes of fe- 
male teachers and those obtained from 
samples of classes of male teachers 
generally are in the same direction, 
and although they vary slightly, such 
variation does not exceed that which 
might be attributed to random sam- 
pling. (When the correlations for Sam- 
ples A and B, and similarly those for 
Samples D and E, were compared, 
none of the differences in correlation 
for men vs. women was significant— 
sven at the .10 level.) The present data 
do not suggest systematic differences 
between men and women teachers in- 
sofar as the relationship between the 
teacher characteristics and pupil be- 
havior estimated here is concerned. 


SUMMARY 


It was hypothesized that (a) certain 
teacher traits (characteristics) esti- 
mated from scores yielded by a self- 
report type inventory, the Teacher 
Characteristics Schedule, would be 
positively correlated with pupil be- 
havior in the teachers’ classes, and 
(b) pupil behavior-teacher character- 
istic relationships would be more in 
evidence in elementary than in sec- 
ondary school classes. 

Previously derived and cross-vali- 
dated scoring keys were applied to 
teachers’ responses to the Teacher 
Characteristics Schedule. Pupil behav- 
ior was directly observed and assessed, 
for intact classes, by trained observers. 

Elementary school data suggest 
moderate relationships between 
sessments of observed purposeful and 
productive pupil behavior and inven- 
tory estimated teacher characteristics 
identified as “understanding, friendly 
behavior”; “organized, businesslike 
behavior”; and “original, stimulating 


as- 
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behavior”; and low-order relationships 
between pupil behavior and teachers’ 
“favorable attitudes toward demo- 
cratic classroom procedures,” “child 
centered or permissive (liberal) edu- 
cational viewpoints,” “favorable atti- 
tudes toward pupils,” and “emotional 
stability.” 

Secondary school data suggest sub- 
stantially fewer relationships between 
the assessments of productive pupil 
behavior and inventory estimated 
teacher characteristics. The only gen- 
eralizable relationships seem to be 
those between pupil behavior and in- 
ventory estimated “original, stiraulat- 
ing teacher behavior,” and, possibly, 
between pupil behavior and inventory 
estimated “organized, businesslike 
teacher behavior.” 


Relationships between pupil behav- 
ior and inventory estimated teacher 
characteristics were less discernible in 
secondary school classes as compared 
with elementary school classes. 

No significant differences were ob- 
served between the pupil behavior- 
teacher characteristic correlations of 
men teachers as compared with women 
teachers. 
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RELATIONSHIP OF INTELLIGENCE TO STEP SIZE 
ON A TEACHING MACHINE PROGRAM! 
CARLETON B. SHAY’ 

University of California, Los Angeles 


Teaching machine programs have 
generally been written with little con- 
sideration for different levels of intel- 
lectual ability. The programed teach- 
ing method itself has been considered 
sufficient to overcome ability differ- 
ences, because most programs have 
been written with minimal step size, 
allowing progression through the pro- 
grams with little error and maximum 
reinforcement. Skinner (1958) sug- 
gests 
a program designed for the slowest student 
in the school system will probably not seri- 
ously delay the fast student, who will be free 
to progress at his own speed ... (p. 976). 


On the other hand, since the intellec- 
tually superior student should be able 
to think in larger steps than one less 
intelligent, it seems likely there is 4 
relationship between intelligence and 
step size. Experimental evidence on 
such a relationship is meager: Porter’s 
(1959) study with spelling words, 
which found no relationship between 
IQ and achievement, supports the con- 
tention that one program is sufficient 
for different ability levels; but results 
of a study by Briggs and Besnard 
(1956), using more complex materials, 
suggest that programs suited to differ- 
ent ability levels might be desirable. 

The present study investigated the 
hypothesis, in null form, that there is 
no relationship between intelligence 
and step size on a teaching machine 
program for each of the following cri- 

* This article is based on the author’s doc- 
toral dissertation at the University of Cali- 
fornia, Los Angeles, under the direction of 
E. R. Keislar : 

* Now with the Santa Monica, California, 


city schools 
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teria: total learning, learning of “rote” 
materials, learning of materials in- 
volving “understanding,” errors, and 
time to complete the program. 

Size of item step was defined, after 
Lumsdaine (1959), as the “difficulty 
of giving the correct answer,” and in- 
ferred from measurement of the num- 
ber of errors made on a program. The 
proportion of errors on a program gives 
a difficulty level from which the aver- 
age probability of correct response 
may be calculated thus: 

Tre 
ss 


NN; 


PIR;} 


where 

= average probability of correct 
response R on items 7; 

number of errors made by the /** 
individual on the 7 item; 

sum of the 7 individuals; and 
sum of the 7 items. 


pi R;} 
Ei; 
N; 
N, 
The probability of responding cor- 
rectly to any given item is: 
> Ey 
3 


piR;} N, 

Both the criteria of individual and 
average item probabilities of correct 
response were used to adjust and eval- 
uate step size in the programs used in 
this study 


METHOD 


Programs 


Three | written 
the fourth grade unit in roman numerals? 


yrograms to cover 


were 


*The programs are available with the 


1960) through 
under separate 
requested 


dissertation (Shay, 
interlibrary loan. They are 
cover, however, and should 
specifically 


author’s 


be 





INTELLIGENCE AND TEACHING MACHINE PROGRAMS 99 


This unit presents the symbols and 
ples for construction of roman numerals to 
399 (Brueckner, Merton, & Grossnickle, 
1957). The programs with few exceptions, 
however, did not venture beyond 100 

A large step program was written first and 
given to successive 4) samples 
of above (IQ >110) pupils, each 
sample followed by item revision until the 
average probability of correct response 
>90%, and most individual item probabili- 
ties of ZSU% This 
large step program was similarly reduced in 
step size for samples of average students 
(IQ 90-109) to form a pro- 
gram; this, in turn, was revised to form a 
small step program for below average pupils 
(IQ < 89). Step size was reduced by the in- 
sertion of bridging items, splitting an item 
into two or more simpler ones, or by reword- 
ing stems or responses to provide 
port to the slower student 

The three programs 
tered to a sample of 90 s 
number at each ability level 
program. The step size was jus 
to form the final experimental programs 
They contained 103 (large step), 150 
dium step), and 199 (small step) items, and 
were duplicated and ring-bound to form 
small booklets approximately 4” «x 6” in size 
The first three items in each were 
used to instruct the response procedures and 
were not considered in the erro: 
timing. With decreasing step size 
difficult and contained 
view items and more varied examples illus- 
trating the [ 
construction 


princi- 


small (N ~ 
average 


was 


correct response were 


medium step 


more sup- 


were next adminis- 
idents, an equal 


and with 


hen readjusted 


(me- 


program 


count or 
programs 
more re- 


were less 


principles of roman numerals 


Apparatus 


Twelve identical devices were 


used. Each device consisted of a board faced 
with aluminum foil upon which was clamped 
a standard IBM answer sheet and an IBM 
matrix (key) faced with aluminum foil and 
punched for the correct answers. The student 
responded by punching through the answer 
sheet with a stylus, completing an electrical 
circuit to either a green or red light, in- 
forming the subject of his success 


response 


Subjects 


Ninety chosen from the 
low fourth grade in four Los Angeles ele- 
mentary schools on the basis of roman nu- 
merals pretest and group intelligence test 
scores. One additional subject was dropped 
because of absence, and a second because of 


subjects were 


\ third 
iverage in if- 


lete failure to follow directions 
of the subjects were above 
telligence (IQ >110), one-third average (IQ 
93-109), and one-third below average (IQ 
<92). Adjustment of the customary lower 

it of the average range 

a lack of acceptable low ability 


com] 


was necessitated 


students 


Pre test and Postte st 


The pretest consisted of 24 items, 12 to be 
written roman numerals from arabic, and 
12 vice versa. They were all numbers from 
1-39 except for L, C, D, M, 40, and 44. Only 
the number 8 was common to both lists 

The consisted of 31 items: 10 
arabic numbers to be roman nu- 
merals, 14 of the opposite 
a sequence which required counting by 100s 
to 1,000 in roman numerals, with C, D, and 
M given. The first two parts mentioned were 
arranged with old 
new numbers not taught in the pre 
The old items were those numerals specifi- 
cally taught and presumably not requiring 
an understanding of roman numerals con- 
recall. The 
pled to these items, refers only to the 
mal amount of understanding necessary for 
learning and not to the me 
tion. In contrast, the new items | 
taught and could be formulated only 
an understanding of these principles 
of the new 
4 and 9) and were numbers between 100 and 
1,000. There were 12 old 
teliability of the 
the application of Kude 
mula 20 to the 


subjects, 


posttest 
written in 
type, and 


ur 
‘ 


irom 


items first, followed by 


crams 


struction for term 


thod of resenta- 
d not been 
rough 
Most 


items involved inversions (as in 


ind 18 new items 
posttest timated by 
For- 
scores of 

was as follows: tot 

83 : new 


old items portion, and 


tion, 9O 


Procedure 


The was administered to all stu- 
dents in each 4B class. Group test 
the records of 
pretest scores were from 4 
jects assigned to ability groups 
within each ability group were 
the three 


pretest 
IQs were 
students whose 
to 20, and 


Ss ibjects 


taken from 


sub- 


randomly 
assigned to one of programs to 
form nine experimental groups of 10 
jects each. These groups, explicitly defined 
in Table 4, were formed from each possible 
pairing of ability group and program 
Subjects worked on the 
groups of 7 to 12, typically including mem- 
bers from 5 or 6 of the experimental groups 
In the first experimental session for 
icted in 


sub- 


programs in 


each 


group, subjects were instr the re- 
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sponse procedurés with the first three items 
Each subject completed 50 items a day on 
consecutive school days until he finished his 
this was followed immediately by 
the posttest. Performance measures ob- 
tained for each subject included old and 
new items posttest and total posttest scores, 
total errors, and time necessary 
the program 


program ; 


oO comple te 


RESULTS 

Tables 1 and 2 indicate the extent 
to which the programs met the criteria 
of average and individual item diffi- 
culties for the experimental groups. 

Values in the lower-left to upper- 
right hand diagonal in each table, 
representing the criterion ability 


TABLE 1 
Mean NumBer or Errors anp MEAN 
PROBABILITY OF CorReEcT RESPONSE 


Ability Group 


Above 
Average 


Program Below 


Average Average 


Large step 
M 36.6 
p{R;} 63.4 
Medium step 
M 34.0 
p{R,;} 76.9 
Small step 
M 22. 
PIR;} 88. 


TABLE 2 
NUMBER AND PERCENTAGE OF ITems WHOSE 
Correct REsPONsE PROBABILITY EQUALS 
or Excrgeps 80% 


Ability Level 
> ° 
Program in 
Average 


Below avr, 
Average average 
Large step 

N 90 


oF 


/0 
Medium step 
N 


or 


/0 
Small step 
N 


or 
tf 
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group-program combinations, suggest 
that the programs were adequate for 
the study. Additional examination of 
the location of items within each pro- 
gram not meeting the individual item 
difficulty criterion revealed a slight 
tendency in the small step program for 
these items to cluster. This trend was 
not considered detrimental to the sub- 
jects’ responses or to the further in- 
terpretation of data. 

Desirability of covariance analyses 
of criterion scores with pretest scores 
as a control variable was indicated by 
the following correlations: pretest-IQ, 
.33; pretest-total posttest, .66; pretest- 
new items posttest, .59; pretest-old 
items posttest, .66; pretest-errors, 
—.51; and pretest-time, —.11. 

Covariance analyses of criterion 
scores are summarized in Table 3. The 
assumption of homogeneity of regres- 
sion was tested in each case and found 
to be satisfied. Inverse sine transfor- 
mations of percentage error scores, 
and logarithmic transformations of 
time scores were used to satisfy the 
assumption of homogeneity of vari- 
ance. The hypothesis that there is no 
relationship between intelligence and 
size of item step could be rejected only 
in the case of percentage error, where 
an interaction was found (p < .05). 
The main effect of ability level was 
obtained for each of the other criterion 
scores, but no interaction. 

Adjusted mean criterion scores for 
each experimental group on each of 
the variables—new, old, and total 
posttest, time, and errors—are given 
in Table 4 


DISCUSSION 


Failure to reject most of the null 
hypotheses asserted in this study sup- 
ports Skinner’s position that it is not 
necessary to provide more than one 
program on the basis of different initial 
ability. Table 4 suggests that at all 
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TABLE 3 


CovARIANCE ANALYsIS OF CRITERION Scores Usine 


A Roman NUMERALS 


PRETEST AS A CONTROL 


Mean Square 


Source 


New items 


28.21) 8 
382.61 160.01'48 .56 
68.62) 26.1514 
36.70 19.44) 6.24 


Programs 
Ability 
Interaction 
Within cells 
* 05 level of significance 


** 001 level of significance 


percentage 


Total posttest 
posttest 
errors 


rn 
E 


l 
7. 
2 


| Log time 
Log time 


= 
— 


80 (1.45 
43**|8.23°* 
87 (1.35 


.93) 1 
3610 
02 


.07 


501 

25/1 

56 
59.88 


t 


TABLE 4 


ApsusTeED MEAN CRITERION SCORES AND 


Total 


- eed « 
Experimenta! grout posttest 


Ability Program M 
Large step 
Medium step 
Small step 


Above Average 


Large step 
Medium step 
Small step 


Average 


Large step 
Medium step 
Small step 


Below Average 


levels of ability and for different types 
of material, as size of item step de- 
creases, there is an accompanying in- 
crease in posttest score and time to 
complete the program. 

The question naturally arises, why 
did the experiment fail to reveal a 
relationship between intelligence and 
size of step, assuming there is a rela- 
tionship? One possible reason is that 
the programs were inadequate. Per- 
haps a 90% probability of obtaining 
the correct response is too low. This is 
suggested by the inverse relationship 
found between posttest score and step 


STANDARD DEVIATIONS 


Percentage 
error 


Time 
minutes 


Old items 
posttest 


New items 
posttest 


M 


49 
76 


90 


75.6 
90.6 


122 


“oe 


87.; 
108 
135.£ 


size. An argument against this expla- 
nation is the possibility that extrane- 
ous factors produced a portion of the 
error, and the “true” error rate was ac- 
tually lower. Machine malfunction was 
known to cause some error, which 
varied according to the subject’s re- 
action to occasionally seeing both 
lights at once. Also, some subjects ex- 
hibited a certain confusion between an 
answer and the number of the answer. 
For example, if the question was 
“What is 5 in roman numerals?” and 
the answer V appeared at Choice Posi- 
tion 3, some subjects would answer at 
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Choice Position 5 rather than 3. (In 
future experiments, lettered response 
positions are recommended.) 

A second possible reason for not 
finding a relationship between intelli- 
gence and step size is that IQ did not 
serve as an adequate predictor of 
learning ability in this study. Studies 
generally show only a moderate rela- 
tionship between learning ability and 
IQ. Perhaps group intelligence tests 
are less effective predictors of learning 
ability where programed teaching se- 
quences are involved than is the case 
in the ordinary classroom. Since pro- 
grams are carefully regulated in terms 
of step size, reading level, sequencing, 
etc., it may be that general intelligence 
as it is usually measured plays a 
smaller role in this type of learning 
than in the where these 
variables are not controlled. An im- 
plication of this is that other, perhaps 
more specific measures of ability 
should be used to predict learning 
from programed teaching. 

A third explanation is the possibility 
that a “real” relationship was obscured 
by the confounding of certain pretest 
relationships and the experimental de- 
sign. Considering the correlations be- 
tween pretest-I1Q and pretest-posttest, 
and assuming that IQ also contributed 
to the variance in posttest scores, the 
use of pretest scores as a control vari- 
able reduced the contribution of 1Q 
to the variance of the criterion vari- 
ables. An implication of this explana- 
tion is that to test the hypothesis 
adequately, it would be necessary to 
use some material such as an artificial 
language, whose pretest scores would 
be essentially equal and unrelated to 
IQ. A further reason for not obtaining 
significant interactions is the possi- 
bility that 10 cases per cell were too 
few to reveal a relationship. 

Another explanation deals with the 
definition of step size. If step size had 


classroom 
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been defined in terms of the amount of 
language or prior experience necessary 
to make the transition from one step 
to the next, it is probably more likely 
that interactions would have been 
found. 

Now let us assume that in fact there 
is no relationship between ability and 
step size. This implies, as Skinner sug- 
gests, that for programed teaching a 
minimally-small step program is the 
most appropriate for all levels of abil- 
ity. This would further suggest that if 
provision for individual differences is 
to be made, it should be made on some 
other than intelligence alone. 
Such a provision is indicated by the 
time data, perhaps. Comparing large 
and small step programs, bright stu- 
dents took almost twice the time to 
improve their score 1.7 points, an 8.9% 
increase. In some situations, such a 
gain may not be worth the additional 
expenditure of time, and the use of 
more than one program would be in- 
dicated. 

It would be desirable to explore the 
hypothesis of this study further, with 
the following differences: more 
jects; easier programs; different learn- 
ing material, such as an artificial lan- 
guage; different age groups; and use of 
different predictors of “learning abil- 
ity” from general to task-specific. 


basis 


sub- 


SUMMARY 


This study examined the hypothesis 
that there is no relationship between 
intelligence and size of item step on a 
teaching machine program under the 
criterion conditions of total learning, 
learning involving rote materials, 
learning of materials involving under- 
standing, errors, and time to complete 
the program. Step size was defined 
as the difficulty of giving the correct 
answer and measured by two criteria 
of error 





INTELLIGENCE AND TEACHING MACHINE PROGRAMS 


Three programed teaching s« quences 
of 103, 150, and 199 items were devel- 
oped fourth grade roman 
numerals. Each was written for a given 
level of ability, and step size adjusted 
to meet stated criteria. Ninety fourth 


covering 


graders were selected on the basis of 
pretest and intelligence test 
From each of three ability levels, three 


scores. 


groups of 10 subjects were randomly 
selected and each group assigned to 
one of the programs, making nine ex- 
perimental groups. Subjects completed 
50 items a day on successive days until 
program completion, which was im- 
mediately followed by the posttest. 
None of the tested null hypotheses 
could be rejected except in the case of 
percentage The results, 


error score 


within the definition of step size used, 


indicate that if there is a relationship 
between intelligence and step size, it 
is not a strong one. This would suggest 
that alternate programs are not neces- 
sary on the basis of ability alone. Fur- 
ther studies were suggested. 
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TRANSFER 


OF VERBAL MATERIAL ACROSS SENSE 


MODALITIES 
PAUL PIMSLEUR axp ROBERT J. BONKOWSKT' 


University of California, Los Angeles 


One of the main arguments against 
the teaching of the spoken foreign lan- 
guage in high schools and colleges 
maintains that there is not enough 
time in the average course in which to 
teach both speaking and reading. 
However, some writers contend that 
students who learn first to speak the 
language will, within the normal 
course time, catch up to or perhaps 
even surpass in reading ability those 
who have been taught reading all 
along. The argument of those who es- 
pouse this view will be bolstered if it 
can be demonstrated experimentally 
that aural learning facilitates visual 
learning. 

Experiments bearing directly on the 
transfer of verbal material across the 
modalities of vision and audition are 
rather scarce in the literature of ex- 
perimental psychology. Weissman and 
Crockett (1957) demonstrated that 
transfer does occur from auditory 
training to visual discrimination. Post- 
man and Rosenzweig (1956) suggested 
that transfer from visual training to 
auditory discrimination is greater than 
conversely. However, in both of the 
above studies the investigators were 
concerned with the thresholds of rec- 
ognition of verbal material. 

The present study addressed itself 
to several questions raised by the 
controversy over the teaching of lan- 
guages: (a) Does aural learning facili- 
tate visual learning? (b) Is this facili- 
tation greater than that achieved by 
presenting the material first visually 

*The authors wish to express their grati- 
tude to Norman H. Anderson for his help- 
ful suggestions and comments on a draft of 


this paper 


and then aurally? (c) In terms of total 
time, is it more economical to teach 
by the aural-visual order than by the 
visual-aural order (i.e., does the stu- 
dent take fewer total trials to learn 
verbal material both visually and au- 
rally when the material is presented 
first aurally and then visually)? Af- 
firmative answers to these questions 
would lend some support to those fa- 
voring the aural approach to language 
teaching. 


METHOD 


Subjects. The subjects were 28 paid 
volunteers recruited from among students 
enrolled in second semester Spanish courses 
(Spanish II) at UCLA. The group consisted 
of 10 females and 18 males. 

Design. A treatments X levels design 
(Lindquist, 1956, pp. 121-149) was used. 
The subjects were first divided into two 
levels: those who had received a grade of 
A or B in Spanish I at UCLA (14); and 
those who had received a grade of C or D 
in Spanish I at UCLA (14). At each level, 
subjects were randomly assigned to one of 
two conditions by the use of a table of ran- 
dom numbers 

Procedure. In the present design ther 
were two treatments. The subjects learned 
a list of paired associates through one mo- 
dality, and relearned the same list through 
another modality. Original learning and re- 
learning were continued to a criterion of two 
consecutive errorless trials. Group V-A 
learned first through a visual presentation 
of the list and relearned through an aural 
presentation of the list. Group A-V received 
the opposite order of presentation of the 
same list: aural original learning and visual 
relearning. For the purposes of the present 
study, these conditions enabled each group 
to serve as a control for the other. The 
original learning scores of one group could 
be compared with the relearning scores of 
the other group as in a simple transfer 
paradigm 

The list consisted of 10 nonsense dissy]- 
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lables, each paired with one of 10 color 
names. The dissyllables, from Dunlap’s 
(1933) list, were chosen to meet require- 
ments of ease of spelling, ease of recognition 
aurally and visually, and similarity in con- 
struction and pronounceability to English 
words. The stimulus-response pairs are listed 
in Table 1. 

In the visual presentation the stimulus 
word was projected on a screen for 1 second 
and then removed. After a 4-second interval 
the response word was projected for 1 sec- 
ond. There was a 4-second interval between 
pairs. In the aural presentation the pairs 
were presented by tape recorder, with a 4- 
second interval both between stimulus and 
response and between pairs. Since a word 
took about 1 second to pronounce, the time 
intervals were comparable. 

Using the anticipation method in each 
condition, the subject responded by writing 
his response on a line of a record sheet using 
a strip of cardboard to cover previous re- 
sponses. The 10 responses in each trial were 
recorded on successive sheets in a booklet 
The 10 pairs were presented in different 
random orders for successive trials in a con- 
dition, but the orders were the same between 
conditions. 

The subjects were run in small groups 
ranging from two to four in order that the 
experimenters could maintain control and 
prevent cheating. The subjects were in- 
structed to leave the room quietly after 
having reached criterion in order to prevent 
additional learning by subjects who finished 
first. The relearning phase of a condition 
was begun as soon as all subjects being run 
at one time had reached criterion in the 
original learning phase. 

The subjects were instructed to learn the 
“meanings of 10 foreign words.” Virtually 
the same instructions were given before 
original learning and relearning so that sub- 
jects did not know in advance that the “10 
foreign words” and their “meanings” in the 
relearning phase were the same as in the 
original learning phase. 


RESULTS AND DISCUSSION 

The dependent variables were trials 
and errors to criterion. The correla- 
tions between these two measures were 
89 for the original learning phase and 
.90 for the relearning phase. Statistical 
tests based on both measures yielded 
similar results. Therefore, only the 
data for number of trials are reported. 


TABLE 1 
Ten Pairs or DissYlLLABLes 
AND CoLor NAMES 


fimur 
runil 
kupod 
tarup 
latuk 


yellow 
brown 
white 
black 
orange 


polef green 
medon purple 
defig red 
nigat pink 
gokem blue 


TABLE 2 


Mean TRIALS TO CRITERION IN ORIGINAL 
LEARNING AND RELEARNING 


Original Learning Relearning 


Spanish I Grade 
Group 
V-A 


Group 
A-V 


Group | Group 
V-A A-V 


6.86 | 2.71 
9.14 14 


AorB 
Cor D 


8.86 


Group Mean | 11.71 | 8.00 | 3.43 | 2.43 
Group SD -67 | 2.78 .87 .62 


The data for the analysis of amount 
of transfer are summarized in Table 2. 
Comparing the relearning scores for 
ach group with the original learning 
scores of the other group suggested 
that there was a marked positive 
transfer across modalities. Aural origi- 
nal learning facilitated visual relearn- 
ing and the converse was also true. 
The facilitation of visual learning by 
prior aural learning may be viewed as 
a finding in favor of the aural ap- 
proach to language teaching. 

Stevens’ (1951, p. 557) formula for 
the savings score was applied in the 
analysis of the differential degree of 
transfer between the two conditions. 
The mean savings scores for Groups 
V-A and A-V were 89.5 and 94.6, re- 
spectively. This showed a tendency for 
subjects in the A-V condition to “save 
more” than subjects in the V-A condi- 
tion. However, the savings score is to 
some extent dependent upon the cri- 
terion set by the experimenter. There- 
fore, an attempt was made to get a 
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TABLE 3 
SuMMARY OF ANALYSES OF VARIANCE OF 
ORIGINAL LEARNING AND OVERALL DaTa 


Overall 


Original 
OL + RL) 


Learning 
Source 

MS , MS 
Mo- 1 | 96.59.8.27** 155.645.86* 

dalities 
Grades 1 
SM XG 1 
Error 


Sense 


112.01/9.59**|165.206.22* 
20 .56/1.76 36.65)1.38 


24 | 11.68 26 . 56| 


* Significant at .05 level. 
** Significant at .01 level. 


more meaningful analysis of the dif- 
ferential degree of transfer. 

In the original learning phase sub- 
jects at both levels of Group V-A 
appeared to have taken more trials to 
criterion, on the average than their 
counterparts in Group A-V. Similarly, 
looking at the performance of subjects 
in each group suggested that the over- 
all mean for Group V-A was greater 
than that of Group A-V in the original 
learning phase. An analysis of vari- 
ance was employed to test the statis- 
tical significance of these differences. 
A summary of this analysis appears in 
Table 3. The significant F for Sense 
Modalities that the visual 
learning was more difficult than the 
aural learning in the original phase. 
This finding indirectly supports the 
conclusion that there was more trans- 
fer in the A-V condition than in the 
V-A condition. For if one can agree 
that the visual task was in fact more 
difficult, one would not expect subjects 
in the A-V condition to take a smaller 
mean number of trials to criterion in 
the relearning phase. Nevertheless, 
subjects in the A-V condition obtained 
a mean of 2.43 trials for the relearn- 
ing phase as opposed to a mean of 3.43 
trials for subjects in the V-A condi- 
tion. It should be stressed that this is 
offered only as ‘ndirect support of the 


suggests 


conclusion that there was greater 
transfer in the A-V than in the V-A 
condition. 

The most significant finding, and 
perhaps the strongest support for those 
advocates of the aural approach to 
language teaching, resulted from the 
analysis of the overall (OL + RL) 
scores. Combining the number of trials 
for each subject in both phases of the 
experiment yielded scores that gave 
some indication of the effectiveness of 
the order of presentation of verbal ma- 
terial. A summary of the analysis of 
variance of these data is given in the 
last column of Table 3. 

The significant F for Sense Modali- 
ties in this case suggests that subjects 
in the A-V condition took on the aver- 
age fewer trials to both criteria than 
subjects in the V-A condition. Hence, 
it took the students fewer total trials 
to learn the verbal material both visu- 
ally and aurally when the material 
was presented first aurally and then 
visually. This has the implication that 
it might be more economical, in terms 
of time, to teach verbal material by 
the aural-visual order than by the 
visual-aural order. 

The significant F for Grades both in 
the original learning and the overall 
data that those subjects 
who achieved a grade of A or B in 
Spanish I took fewer trials to criterion, 
on the average, than those subjects 
who achieved a grade of C or D. In 
itself, such a result might have been 
predicted by the experienced language 
teacher. However, the Grades effect 
is of importance also in considering 
the methodology of transfer studies 
similar to the present study. The de- 
sign used afforded a neat method of 
matching the groups as well as increas- 
ing the efficiency of the experiment 
In fact, had the groups not been so 
matched, the estimate of the error 
mean square would have been in- 


suggested 
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The inflated 
would have 
chances of obtaining a 
Sense Modalities effect. 

On the basis of the results of this 
study the investigators concluded: 
aural learning does facilitate visual 
relearning, this facilitation is greater 
than the facilitation of aural relearn- 
ing by visual original learning, and it 
takes less time to teach verbal mate- 
rial both visually and aurally when 
the material is presented first aurally 
and then visually. 


creased. error mean 
reduced the 


significant 


square 


SUMMARY 


A list of 10 paired associates (dis- 
syllables as stimuli and color names 
as responses) was randomly presented 
first through one modality and then 
through another modality. Half the 
subjects learned the list first through 
the visual modality and then relearned 
it through the auditory modality. The 
other half learned the list in the oppo- 
site order 

Positive transfer was found in both 
directions. It was suggested that the 
aural presentation had a greater facili- 


107 


tating effect upon the visual presenta- 
tion than conversely. The subjects 
took fewer total trials to learn verbal 
material both visually and aurally 
when the material was presented first 
aurally and then visually. These find- 
ings seemingly offer some support for 
the view that aural instruction pre- 


ceding visual instruction may have 


advantage over conventional methods 


of language teaching if the goal is to 

achieve proficiency in both reading 

and aural comprehension. 
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REPRESENTATION OF STIMULI 
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Within an association framework, 
concept formation has been conceived 
of as the process by which a group of 
stimuli come to consistently elicit the 
same response. Obviously, one impor- 
tant factor determining the rate of 
concept formation, then, would be the 
vase with which the stimuli elicit the 
common response. 

In order to study the formation of 
verbal concepts, Underwood and Rich- 
ardson (1956a) scaled a series of com- 
mon nouns in terms of their dominant 
descriptive associations. Concept ma- 
terials were then formed by selecting 
groups of nouns having the same dom- 
inant association. These were pre- 


sented to subjects who had to discover 


and learn the correct associations. 
Further studies (Freedman & Med- 
nick, 1958; Underwood, 1957; Under- 
wood & Richardson, 1956b) utilizing 
these materials have explored several 
variables affecting the rate of concept 
learning. 

In all of the studies cited above, 
the stimuli (the nouns) were all pre- 
sented verbally (i.e., in terms of the 
word representing the object). It has 
been suggested (Underwood, 1952) 
that other modes of presentation might 
have some effect on the rate of con- 
cept learning of the type described 
above. The purpose of the present 
study was to compare the learning of 
verbal concepts with pictorial repre- 
sentation of stimuli with the more 
standard verbal representation. 

It would seem that a case might be 
made for the superiority of either con- 
dition. Verbal stimuli might be easier 


because irrelevant parts of the pic- 
tures could interfere with the produc- 
tion or discovery of the correct 
responses. On the other hand, with pic- 
torial representation, the correct as- 
sociation (a descriptive adjective) 
could be made quite apparent from 
the nature of the picture, while in the 
case of the verbal representation, the 
subject is free to “picture” the object 
any number of possible ways, and may 
picture it in such a way as to interfere 
with the discovery of the correct re- 
sponse. 

In order to provide data relevant 
to these hypotheses, three conditions 
were utilized in the present study: 
Verbal (the name of the object), Pic- 
ture Dominant (with the correct asso- 
ciation emphasized by the picture), 
and Picture Nondominant (with the 
correct association de-emphasized). 
Grade level was also incorporated as 
one variable in the design, as it seemed 
desirable to be able to extend and gen- 
eralize any results to include several 
educational levels if possible. 


METHOD 


Materials. The concepts and stimuli were 
chosen directly from Underwood and 
Richardson’s materials (1956a). These are 
presented in Table 1. Certain criteria were 
used in selecting concepts. Only high domi- 
nant concepts were ‘used, overlap and intra- 
list similarity were kept as low as possible, 
and all nouns had to be easily represented in 
pictorial form 

Three sets of the stimuli were hand drawn 
on 5” X 6” railroad board cards in black 
india ink. One set consisted of the words 
representing each noun. The second and 
third sets were line drawings of the objects 
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In one set, the descriptive characteristic 
which was to be the correct re spt 
emphasized by the drawing (Picture Domi- 
nant). In the other set, this was de-empha- 
sized (Picture Nondominant). For example, 
a bed was drawn as appearing soft and 
billowy for one set, and looking like an army 
cot for the other 

Subjects. Sixty high school students (15 
at each of four grade levels: ninth, tenth, 
eleventh, and twelfth) 
in this study.’ Three groups of 20 subjects 
each were formed with 5 subjects from each 
grade level in each of the three groups. No 
attempt was made to match groups, with 
subjects being assigned in order of appear- 
ance. The IQ level of all students ranged 
from 90-110. Each group learned one set of 
the concept materials 

Procedure. Each subject indi- 
vidually, seated at a table facing the ex- 
perimenter. Stimuli were exposed manually 
for 3 seconds with a 2-second interval be- 
tween them. Intervals were timed by an 
electronic timer with an audible click. In- 
structions were essentially the same as those 
used by Underwood and Richardson (1956b) 
The subject was told to respond during the 
3-second interval with descriptive words, i.e., 
those expressing size, shape, texture, etc. The 
instructions were not completely standard- 
ized as it was necessary that each subject 
understood the task. One example (Ele- 
phant-big) was used in which examples of 
descriptive words were given. The subjects 
were also told that only one “response” was 
correct and that the whole list took only 
four The experimenter said 
“right” after all correct responses and said 
nothing if the subject had not responded 
correctly within 3 seconds. Each subject re- 
ceived 15 repetitions of the list, with the 
cards being shuffled after each repetition 


mse was 


Served as subjects 


was run 


responses 


RESULTS 

The mean number of correct re- 
sponses for each group over all 15 
trials is shown in Figure 1. Analysis 
of variance of these data showed that 
both mode of presentation and grade 
level were significant (F for mode of 
presentation was 15.26; for grade level 
5.87). Individual é¢ tests showed the 
verbal instances produced better per- 

‘Subjects for this study were provided 
by Azusa High School, Azusa, California; 
Nelson Price, Principal 


TABLE 1 
Concert MatTeriats Usep 
Concept Instances 
baseball 
doughnut 


barrel 
spool 


round 


bed 

fur 
pillow 
moccasin 


fang 
fishhook 
hatchet 
knife 


sharp 


lizard 
earthworm 
oyster 
snake 


slimy 


formance than the picture emphasized 
instances (t = 2.67, p < .01) and the 
picture emphasized instances better 
performance than the pictures de-em- 
phasized (t = 2.54, p < .02). As to 
grade level, the only significant differ- 
ence was between Grades 11 and 12. 
There was no interaction between 
mode of presentation and grade level 
(F = 1.29), even though the graph 
suggests that the differences were 
larger at the lower grade levels. 


DISCUSSION 


The most important finding of this 
study is that performance in concept 
learning was better when the stimuli 
were in the form of words than when 


they were represented pictorially. 
There are several interpretations, two 
of which seem likely. First, the sub- 
ject is using the same medium as the 
stimulus in making his response, i.e., 
he is looking at a word and respond- 
ing with a word. Since the words used 
here are common ones, it is reasonable 
to assume that the concept and stimuli 
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Fig. 1. Mean correct responses as a function of grade level forjVerbal, Picture Dom- 


inant, and Picture Nondominant instances 
have been associated during the sub- 
ject’s prior experience. Hence, in emit- 
ting the correct response, the subject 
may not “form an image” at all, but 
merely responds with a highly likely 
verbal associate. Karwoski, Gramlich, 
and Arnott (1944) came to somewhat 
the same conclusion from their data 
which show a longer reaction time to 
objects than to words. 

The interpretation which 
most plausible, however, attributes 
these findings to the particular con- 
cepts which were used. Such qualities 
“soft,” “sharp,” and “slimy” are 
primarily tactual rather than visual 
It is very likely, then, that pictorial 
representation of these objects sug- 
gested visual associations such as 
color, form, ete., or at least interfered 
with the production of an association 
of a nonvisual nature. One test of this 
might have been to determine whether 
the sé 
sociations given to the instances of 


seems 


as 


wrong” associations or “first” as- 
these concepts were of this nature. 
However, the mechanics of adminis- 
tering the task precluded recording 
these responses. Certainly, the hy- 
potheses advanced here would seem- 
ingly predict that Underwood-Rich- 


ardson Dominance Values scaled to 
the pictures would be different from 
those obtained with words. Even this, 
however, would not be crucial since 
the Dominance Values do not neces- 
sarily represent the relative strengths 
of responses within a single individual, 


but merely the percentage of the sub- 
jects who gave a particular response 
as the first association. Interfering re- 
sponse tendencies might not be mani- 
fested in these Dominance Values, but 
could appear as lengthened response 


time 

This interpretation suggests that 
broad generalizations regarding the 
relative effectiveness of pictures, ob- 
jects, and words as materials for con- 
cept formation will in all probability 
not be found. As is the case in many 
complex learning situations, the effects 
of a particular variable will probably 
be found to depend upon the specific 
nature of concepts to be learned. 

Mention perhaps should be made of 
the fact that these results are contrary 
to those obtained in a paired-associate 
task by Wimer and Lambert (1959) 
They found learning to be superior to 
object stimuli as opposed to word 
timuli and attribute these resuits to 
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the greater similarity of the verbal 
stimuli. In reconciling the present find- 
ings with these results, two things 
might be pointed out: First, the simi- 
larity of the materials used here might 
differ considerably from that in the 
materials used by Wimer and Lam- 
bert who used actual objects, not pic- 
tures. Second, similarity does not act 
the same in concept formation studies 
as in paired-associate learning and, in 
fact, any similarity between instances 
of the same concept should actually 
facilitate acquisition (Richardson, 
1958). Thus, there is probably no in- 
compatibility of the present findings 
with those of Wimer and Lambert. 

The fact that the more advanced 
subjects performed better on this task 
probably may be attributed to in- 
creasing mental age. There is, in ad- 
dition, some that the 
differences between methods of pres- 
entation are not as marked in the 
twelfth graders as in the other groups. 
Although, statistically speaking, the 
conservative conclusion is that the 
trend is unreliable, its regularity sug- 
gests a real phenomenon. It could have 
been produced by something in the 
nature of the associations elicited by 
the stimuli or perhaps by a more so- 
phisticated approach on the part of 
the more advanced students. At any 
rate, any explanation for this some- 
what questionable phenomenon must 
await further exploration 


suggestion 


SUMMARY 


The formation of verbal concepts 
was investigated in high school stu- 
dents utilizing instances consisting of 
words, pictures of objects with the cor- 
rect concept accentuated, or pictures 


of objects with the correct concept de- 
emphasized. Four different grade lev- 
els were combined with the three 
modes of presentation of the concepts 
in a factorial design. The results 
showed that concept learning was best 
when the instances wer. presented as 
words, followed by the picture stimuli 
with the concept accentuated. The re- 
sults were attributed to the nature of 
the specific concepts studied, and it 
was concluded that in all likelihood no 
simple answer to the question of which 
type of presentation of instances is 
superior can be obtained. 
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Although the use of marks and 
grades in educational practice has re- 
ceived considerable attention in the 
research literature, little if any infor- 
mation is available about the effects 
of grades on the students to whom 
they are given (Odell, 1950). The 
present study was designed to examine 
the effect on attitudes of differential 
assignment of grades for performance 
on attitude related essays. 

The basic hypothesis guiding the 
study was that grades will serve to af- 
fect behavior in the form of a rein- 
forcing contingency. More specifically, 
a “good” grade should serve to effect 
repetition of the responses which it 
followed, while a “poor” grade should 
produce reduction in the potentiality 
of appearance of the preceding re- 
sponses. 

In the present study, students re- 
sponded to attitude scales and several 
weeks later were requested to write es- 
says on topics related to these atti- 
tudes, adopting a position incongruent 
with their measured attitudes. At ran- 
dom, grades were assigned to these es- 
says. It was predicted that students 
who received an A for their essays 
should change more in the direction 
of their essays on a subsequent atti- 
tude measure than a group that re- 
ceived a D. 

In applying the hypothesis of the 
reinforcing effects of grades to atti- 
tude changes in the present context, 
the formulation presented by Doob 
(1947) is being followed. Doob views 
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an attitude as an implicit, anticipatory 
response which mediates overt behav- 
ior but which in turn is derived from 
the reinforcement of overt behavior. 
For Doob, reward or avoidance of 
punishment may constitute the rein- 
forcing contingency. Accordingly, the 
reinforcement of attitude related, 
overt statements may be expected to 
be functionally related to changes in 
measured attitude. Evidence support- 
ing this view has been supplied by 
Scott (1957, 1959a, 1959b). The pres- 
ent study attempts to extend Doob’s 
formulation into the area of educa- 
tional practices. 

Scott has proposed the possibility 
that verbalization of a position op- 
posed to initial attitude in and of it- 
self may produce change with rein- 
forcement or nonreinforcement leading 
to stability or extinction of the new 
response. Janis and King (1954) 
showed that verbalization alone pro- 
duced change. The relative contribu- 
tion of verbalization alone in contrast 
to the effect of the consequent contin- 
gencies could not be examined by 
Scott because of the absence of a con- 
trol group which experiences no conse- 
quences of their verbalizations. In the 
present experiment, a group was in- 
cluded that received no grade follow- 
ing their essays. 


METHOD 


A 40-item questionnaire containing four 
10-item attitude scales were administered 
to 228 students enrolled in Communication 
Skills classes at the State University of Iowa 
Subjects responded to each item on a five- 
point continuum ranging from strongly agree 
to strongly disagree. The four attitude scales 
dealt with federal aid to education, legalized 
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gambling, capital punishment, and socialized 
medicine. Responses to the scale on federal 
aid to education were extremely skewed and 
the capital punishment scale proved to be 
unreliable. These two scales were discarded 
from further consideration and served pri- 
marily for filler items. The initial adminis- 
tration of the questionnaire was conducted 
by the class instructors 

During a class period, approximately 6 
weeks following administration of the scales, 
the subjects were asked by the experimenter 
to write essays on particular assigned topics. 
The subject was instructed by directions 
appearing at the top of a sheet to write a 
brief essay supporting a position on either 
legalized gambling or socialized medicine 
Scores on the attitude scales determined the 
position that was assigned. In each case, the 
subject was instructed to write supporting 
the position opposite to that indicated by 
his pretest scale score, i.e., if the subject's 
scale score indicated favorability to legalized 
gambling, he was asked to write in opposi- 
tion to legalized gambling. The designation 
of topic to a particular subject was based 
on the strength of his initial position on the 
scales. The scale chosen was the one on which 
the subject had assumed the strongest posi- 
tion relative to the other One-half 
hour was permitted for the essay. On com- 
pletion of the the experimenter 
promised to return grades on the following 
day. 

On a random basis, grades were assigned 
to the essays. One third of the subject’s writ- 
ing on each topic received a grade of A, one 
third received a grade of D, and one third 
was given no grade. This last group was told 
when papers were returned that the ratings 
were not completed due to insufficient time 
Immediately after returning the essays and 
grades, the attitude questionnaire was re- 
administered. Finally, subjects were asked 
to indicate their satisfaction with the essays 
The total number of subjects participating 
in all phases of the study numbered 127, of 
whom 58 wrote an essay on legalized gam- 
bling and 69 wrote on socialized medicine 

Scores on each 10-item attitude scale were 
computed by summing the responses to each 
item for which a weight from 1 to 5 was 
given. The total minimum possible 
for each scale was 10 and the maximum pos- 
sible score was 50. The dependent change 
measure was derived by subtracting each 
subject’s score on the posttest from his score 
on the pretest. A change in the direction of 
the position taken on the essay was given 
a positive sign. Change in the opposite di- 
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rection was negative. To eliminate negative 
numbers a constant of 20 was added to all 
change scores. 


RESULTS 


It was predicted that subjects who 
were awarded an A would change on 
the average in the direction of their 
essays to a greater extent than sub- 
jects who were given a D. Table 1 
presents the data relevant to this pre- 
diction. Subjects receiving an A 
changed an average of 31.76 points in 
the direction of their essays while sub- 
jects who were given D changed 25.85 
points. This difference is significant at 
beyond the .01 level. Comparisons of 
the groups that received a grade with 
the group that did not, indicates sig- 
nificantly greater change (p < .05) 
for the subjects who received an A 
than for those given no grade, while 
no difference is suggested between the 
subjects obtaining a D and subjects 
receiving no grade. 

Analysis of mean change for each of 
the issues suggests similar results to 
those obtained in the overall analysis 
although the differences obtained were 
more striking for the socialized medi- 
cine issue than for legalized gambling. 
An analysis of mean change in relation 
to initial position indicates that those 


TABLE 1 
Mean ATTITUDE CHANGE 


. vat Grade A Grade D No Grade 
Change (N = 47) (N = 38) 


25.85 27.11 


Mean 


SD 7.84 
Differences between groups? 


8.03 


t Pp 
A vs. D: 2.84 <.01 
A vs. No: 13 <.05 
D vs. No: oda >.10 
* A constant of 20 was added to all change scores. 
> Because of heterogeneity of variance, ¢ tests were 
computed employing the procedure recommended by 
Edwards (1960). 
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who had initially assumed a favorable 
position on each of the issues (i.e., in 
favor of legalized gambling or social- 
ized medicine) changed significantly 
more (t = 4.73, p < .01) than those 
who were unfavorable. Nevertheless, 
the effect of grades on attitude change 
remains similar to that obtained when 
direction of initial position is not con- 
sidered. 

Subjects, in addition to responding 
to the attitude scale related to the 
topic on which they had written an 
essay, also responded to the scale for 
which a related essay had not been 
written. By comparing the change 
scores of subjects who had written an 
essay on a particular topic with those 
who had not written on that topic it 
is possible to evaluate the effect of es- 
say writing independent of the effect 
of grades.* The mean change obtained 
for subjects who had written a rele- 
vant essay was 28.24 in contrast to 
22.61 for subjects who had written on 


the other topic. This difference is sig- 
nificant (t = 3.27, p < .01) suggesting 
that the writing of an essay, inde- 
pendent of grade received, produced 
change in attitude. 


DISCUSSION 


The results suggest support for the 
hypothesis that a “good” grade serves 
to reinforce the behavior for which it 
has been administered. Verbalization 
without a consequent contingency 
seems to lead to responses similar to 
those obtained when verbalization is 
followed by a “bad” grade. “Cognitive 
contact” with the opposing side in and 
of itself does not appear to produce 


* Comparison of the scores of each of the 
experimental groups individually with this 
nonessay writing “control” group would 
seem to be precluded because of potential 
unknown effects of simply receiving a grade. 
Collapsing across groups regardless of ex- 
perimental condition potentially randomizes 
such effects. 
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effects approaching those obtained 
when reward is forthcoming. 

Some insight into additional effects 
of the grades is provided by the re- 
sults associated with a question as to 
the satisfaction the subjects experi- 
enced with their essays. Of the 52 sub- 
jects who received an A, 42 indicated 
satisfaction with their essays. Twelve 
of the 57 subjects who received a D 
were satisfied with their essays. Of 
the subjects who received no grade, 12 
out of 57 were satisfied. A chi square 
of 41.99 (df = 2) indicated that this 
distribution is significant beyond the 
.001 level. It is apparent that no dif- 
ference in satisfaction with their es- 
says appears for those who received a 
poor grade and those who were given 
no grade. Both events appear in this 
instance to be functionally equivalent 
in their effects in contrast to the ef- 
fect of a good grade. The no grade 
condition was an unusual one in class 
practice and seemed to operate for the 
subjects as a poor grade, perhaps due 
to the frustration of failure for the ex- 
pectancy of receiving some grade to be 
fulfilled. 

Some evidence is provided that ver- 
balization in an incongruent essay is 
effective in producing attitude change, 
independent of consequences. This 
supports the findings of Janis and 
King (1954) on the relation of role 
playing and attitude change. 

The experiment presents evidence 
only on the effect of grades on one as- 
pect of a complex set of behaviors ap- 
pearing in the essay writing situation. 
A qualified generalization can be of- 
fered that the administered grades af- 
fected in a similar manner many 
other unmeasured aspects of perform- 
ance in the situation, i.e., composi- 
tional skills, ideational patterns, affect 
concerning essay writing, etc. Further 
research is necessary to delineate those 
behaviors affected in grading situa- 
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tions. Educational practice will profit 
from understanding of the functional 
role of grades for other than descrip- 
tion of assessment procedures. 


SUMMARY 


The potential effects of grades as a 
reinforcing contingency were exam- 
ined. University students wrote essays 
defending positions on attitude related 
issues contrary to their previously as- 
sessed positions. Good and poor grades 
were randomly assigned to the essays 
and reported to the students. The ef- 
fect of these procedures on attitude 
change were evaluated and good 
grades were demonstrated to serve a 
reinforcing role in contrast to the ef- 
fects of a poor grade or no grade. 
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